perm filename V2Q.TEX[TEX,DEK] blob
sn#376683 filedate 1978-09-02 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00014 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00003 00002 %folio 725 galley 1 (C) Addison-Wesley 1978 -
C00022 00003 %folio 731 galley 2a (C) Addison-Wesley 1978 -
C00032 00004 %folio 733 galley 2b (C) Addison-Wesley 1978 -
C00037 00005 %folio 735 galley 3a (C) Addison-Wesley 1978 -
C00043 00006 %folio 738 galley 3b (C) Addison-Wesley 1978 -
C00054 00007 %folio 742 galley 4 (C) Addison-Wesley 1978 -
C00065 00008 %folio 745 galley 4b (C) Addison-Wesley 1978 -
C00074 00009 %folio 748 galley 5 (C) Addison-Wesley 1978 -
C00090 00010 %folio 754 galley 6 (C) Addison-Wesley 1978 -
C00105 00011 %folio 758 galley 7a (C) Addison-Wesley 1978 -
C00115 00012 %folio 760 galley 7b (C) Addison-Wesley 1978 -
C00120 00013 %folio 761 galley 8 (C) Addison-Wesley 1978 -
C00131 00014
C00133 ENDMK
C⊗;
%folio 725 galley 1 (C) Addison-Wesley 1978 -
|newtype 58320---Computer Programming\quad (Knuth/Addison-Wesley)\quad
f. 725\quad Ans\quad g. 1c
$$ \left[${\ldotss , a↓3, a↓2, a↓1, a↓0; a↓{-1}, a↓{-2}, . .
$.
|qleft 5\ldotss , b$↓3, b↓2, b↓1, b↓0; b↓{-1}, b↓{-2}, . . .}$}
= \left[${\ldotss , A↓3, A↓2, A↓1, A↓0: A↓{-1}, A↓{-2}, . . .\atop
\ldotss , B↓3, B↓2, B↓1, B↓0: B↓{-1}, B↓{-2}, . . .}$}
|qleft \4skip if\qquad $A↓j = \left[{a↓k|infinf j|infinf +|infinf
1↓{-1}, a↓k|infinf j|infinf +|infinf 1↓{-2}, \ldotss , a↓k|infinf
j\atop b↓k|infinf j|infinf +|infinf 1↓{-2}, \ldotss , b↓k|infinf
j}}, B↓j = b↓k|infinf j|infinf +|infinf 1↓{-1} \ldotss b↓k|infinf
j$,
|qctr \9skip where \langle $k↓n\rangle$ is any infinite sequence
of integers with $k↓{j+1} > k↓j$.
\ansno 11. (The following algorithm works both for addition
or subtraction, depending on whether the plus or minus sign
is chosen.)
|qleft \qquad Start by setting $k ← a↓{n+1} ← a↓{n+2} ← b↓{n+1}
← b↓{n+2} ← 0$; then for $m = 0, 1, . . . , n + 2$ do the following
: Set $c↓m ← a↓m \pm b↓m + k$; then if $c↓m ≥ 2$, set $k ← -1$
and $c↓m ← c↓m - 2$; if $c↓m < 0$, set $k ← 1$ and $c↓m ← c↓m
+ 2$; and if 0 ≤ $c↓m ≤ 1$, set $k ← 0$.
\ansno 12. (a) Subtract $(\ldotsm a↓30a↓10)↓{-2}$ from $(\ldotsm a↓40a↓20a↓0)↓{-2}$
in the negative binary system.\xskip (b) Subtract $(\ldotsm b↓30b↓10)↓2$
from $(\ldotsm b↓40b↓20b↓0)↓2$ in the binary system.
\ansno 13. $(1.90909090\ldotsm)↓{-10} = (0.090909\ldotsm)↓{-10} = {1\over 11}$.
\ansno 14. \qquad \qquad |tab 1 1 3 2 1\qquad |tab [5 - 4$i]⊗⊗1
1 3 2 1⊗[5 - 4i]\cr
\-2skip ⊗---------------\cr
⊗1 1 3 2 1\cr
\hfill 1 ⊗1 2 0 2\cr
\hfill 1 2 ⊗1 2 3\cr
\hfill 1 1 3 ⊗2 1\cr
\hfill 1 1 3 2 ⊗1\cr
\-2skip --------------------------------$-
|qleft \hfill 0 1 0 3 ⊗1 1 2 0 1⊗[9 - 40i]\cr
\3skip {\bf 15.} [$-{10\over 11}$, ${1\over 11}$], and the rectangle
shown in Fig.\ A--6.
|qleft \4skip {\bf Fig.\ A<6.} Fundamental region for quater-imaginary
numbers.
\ansno 16. It is tempting to try to do this in a very simple
way, by using the rule 2 = (1100)$↓{i-1}$ to take care of carries;
but that leads to a nonterminating method if we try to add one
to (11101)$↓{i-1} = -1$.
|qleft \qquad The following solution does this by providing
four related algorithms (namely for adding or subtracting 1
or $i)$. If $α$ is a string of zeros and ones, let $α↑P$ be
a string of zeros and ones such that $(α↑P)↓{i-1} = (α↑P)↓{i-1}
+ 1$; and let $α↑{-P}, α↑Q, α↑{-Q}$ be defined similarly, with
$-1$, $+i$, and $-i$ respectively in place of $+1$. Then
$$$$
|qctr \hfill (α0)$↑P ⊗= α1; (αx1)↑P = α↑Qx0.\hfill (α0)↑Q ⊗=
α↑P1; (α1)↑Q = α↑{-Q}0.\cr
\hfill (αx0)↑{-P} ⊗= α↑{-Q}x1; (α1)↑{-P} = α0.\hfill (α0)↑{-Q}
⊗= α↑Q1; (α1)↑{-Q} = α↑{-P}0.\cr
\9skip$ Here $x$ stands for either 0 or 1, and the strings are
extended on the left with zeros if necessary. The processes
will clearly always terminate. Hence every number of the form
$a + bi$ with $a, b$ integers is representable in the $i - 1$
system.
\ansno 17. No; the number $-1$ cannot be so represented. (This
can be proved by constructing a set $S$ as in Fig.\ 1. We have
the representations $-i = (0.1111111 . . .)↓{1+i}, i = (100.1111111
. . .)↑{1+i}$, but $S$ contains no integer points besides 0
and $↓-i.)$ See exercise 28, however.
\ansno 18. Let $S↓0$ be the set of points $(a↓7a↓6a↓5a↓4a↓3a↓2a↓1a↓0)↓{i-1}$,
where each $a↓k$ is 0 or 1. $(S↓0$ is given by the 256 interior
dots shown in Fig.\ 1, if that picture is multiplied by 16.)
We first show that $S$ is closed: If $y↓1, y↓2, . . $. is an
infinite subset of $S$, we have $y↓n = \sum ↓{k|}¬R↓1 a↓{nk}16↑{-k}$,
where each $a↓{nk}$ is in $S↓0$. Construct a tree whose nodes
are $(a↓{n1}, \ldotss , a↓{nr})$, for $1 ≤ r ≤ n$, and let a
node of this tree be an ancestor of another node if it is an
initial subsequence of that node. By the infinity lemma this
tree has an infinite path $(a↓1, a↓2, a↓3, . . .)$, and so $\sum
↓{k|}¬R↓1 a↓k16↑{-k}$ is a limit point of $\{y↓1, y↓2, . . .\}$
in $S$.
|qleft \qquad By the answer to exercise 16, all numbers of the
form $(a + bi)/16↑k$ are representable, when $a$ and $b$ are
integers. Therefore if $x$ and $y$ are arbitrary reals and $k
≥ 1$, the number $z↓k = (\lfloor 16↑kx\rfloor + \lfloor 16↑ky\rfloor
i)/16↑k$ is in $S + m + ni$ for some integers $m$ and $n$. It
can be shown that $S + m + ni$ is bounded away from the origin
when $(m, n) ≠ (0, 0)$. Consequently if $|x|$ and $|y|$ are
sufficiently small and $k$ is sufficiently large, we have $z↓k
\in S$, and lim$↓{k|}¬M↓|¬X z↓k = x + yi \in S$.
\ansno 19. If $m > u$ or $m < l$, find $a \in D$ such that $m
≡ a$ (modulo $b)$; the desired representation will be a representation
of $m↑\prime = (m - a)/b$ followed by $a$. Note that $m > u$
implies $l < m↑\prime < m; m < l$ implies $m < m↑\prime < u$;
so the algorithm terminates.
|qleft \qquad [There are no solutions when $b = 2$. The representation
will be unique iff 0 \in $D$; nonunique representation occurs
for example when $D = \{-3, -1, 7\}, b = 3$, since $(α)↓3 =
(|spose ff377|spose ff3α)↓3$. When $b ≤ 3$ it is not difficult
to show that there are exactly 2$↑{b-3}$ solution sets $D$ in
which |$a| < b$ for all $a \in D$. Furthermore the set $D =
\{0, 1, 2 - ε↓2b↑n, 3 - εb↑n, \ldotss , b - 2 - ε↓{b-2}b↑n, b
- 1 - b↑n\}$ gives unique representations, for all $b ≥ 3$ and
$n ≥ 1$, when each $ε↓j$ is 0 or 1.]
\ansno 20. (a) 0.|spose ff1|spose ff1|spose ff1 \ldots = |spose ff1.888
\ldots = |spose ff18$.{1\atop 7}$${1\atop 7}$${1\atop 7}$ \ldots
= |spose ff18${1\atop 7}$$.{222\atop 666}$ \cdots = \cdots =
|spose ff18${123456\atop 765432}$$.{777\atop 111}$ . . . has
nine representations.\xskip (b) A ``$D$-fraction'' $.a↓1a↓2 . . $.
always lies between $-1/9$ and $+71/9$. Suppose $x$ has ten or more
$D$-decimal representations. Then for sufficiently large $k,
10↑kx$ has ten representations which differ to the left of the
decimal point: 10$↑kx = n↓1 + f↓1 = \cdots = n↓{10} + f↓{10}$
where $f↓j$ is a $D$-fraction. By uniqueness of integer representations,
the $n↓j$ are distinct, say $n↓1 < \cdots < n↓{10}$, hence $n↓{10}
- n↓1 ≥ 9$; but this implies $f↓1 - f↓{10} ≥ 9 > 71/9 - (-1/9)$,
a contradiction.\xskip (c) Any number of the form $0.a↓1a↓2 \ldotsm
$, where each $a↓j$ is $-1$ or 8, equals |spose ff1$.a↑{??}↓{1}a↑{??}↓{2}
\ldotsm$ where $a↑{↑\prime}↓{j} = a↓j + 9$ (and it even has six
{\sl more} representations |spose ff18$.a↑{\dprime }↓{1}a↑{\dprime
}↓{2} . . $. etc.).
\ansno 21. We can convert to such a representation by using
a method like that suggested in the test for converting to balanced
ternary.
|qleft \qquad In contrast to the systems of exercise 20, zero
can be represented in infinitely many ways, all obtained from
${1\over 2}$ + \sum $↓{k|}¬R↓1 -4{1\over 2} \cdot 10↑{-k}$ or
from the negative of this representation, by multiplying it
by a power of ten. The representations of unity are 1${1\over
2}$ - ${1\over 2}$, ${1\over 2}$ + ${1\over 2}$, 5 - 3${1\over
2}$ - ${1\over 2}$, 5 - 4${1\over 2}$ + ${1\over 2}$, 50 - 45
- 3${1\over 2}$ - ${1\over 2}$, 50 - 45 - 4${1\over 2}$ + ${1\over
2}$, etc., where \pm ${1\over 2}$ = (\pm 4${1\over 2}$)(10$↑{-1}
+ 10↑{-2} + . . .). [AMM \bf 57} (1950), 90--93.]
\ansno 22. Assuming that we have some approximation $b↓n \ldotsm
b↓1b↓0$ with error \sum $↓{0|}¬E↓{k|}¬E↓n b↓k10↑k
- x > 10↑{-t}$ for $t > 0$, we will show how to reduce the error
by approximately 10$↑{-t}$. (The process can be started by finding
a suitable \sum $↓{0|}¬E↓{k|}¬E↓n b↓k10↑k > x$; then a finite
number of reductions of this type will make the error less than
$ε.)$ Simply choose $m > n$ so large that the decimal representation
of $-10↑mα$ has a one in position $10↑{-t}$ and no ones in positions
10$↑{-t+1}, 10↑{-t+2}, \ldotss , 10↑n$. Then $10↑mα +$ (a suitable
sum of powers of 10 between $10↑m$ and $10↑n) + \sum ↓{0|}¬E↓{k|}¬E↓n
b↓k10↑k \approx \sum ↓{0|}¬E↓{k|}¬E↓n b↓k10↑k - 10↑{-t}$.
\ansno 23. The set $S = \{\sum ↓{k|}¬R↓1 a↓kb↑{-k} | a↓k \in
D\{$ is closed as in exercise 18, hence measurable, and in fact
it has positive measure. Since $bS = ∪↓{a|}¬A↓D (a + S)$, we
have $b\mu (S) = \mu (bS) ≤ \sum ↓{a|}¬A↓D \mu (a + S) = \sum
↓{a|}¬A↓d \mu (S) = b\mu (S)$, and we must have $\mu \biglp
(a + S) ∩ (a↑\prime + S)\bigrp = 0$ when $a ≠ a↑\prime \in D$.
Now $T$ is a union of countably many sets of the form $10↑k\biglp
n + ((a + S) ∩ (a↑\prime + S))\bigrp , a ≠ a↑\prime $, each
of measure zero.
|qleft \qquad [The set $T$ cannot be empty, since the real numbers
cannot be written as a countable union of disjoint, closed,
bounded sets. If $D$ has less than $b$ elements, the set of
numbers representable with radix $b$ and digits from $D$ has
measure zero. If $D$ has more than $b$ elem5, 0 ≤ a↑\prime <
2\}, for $k ≥ 0$. [R. W. Graham and D. W. Matula have shown
that there are no more sets of integer digits with these properties.
And Andrew Odlyzko has shown that the restriction to integers
is superflous, in the sense that if the smallest two elements
of $D$ are 0 and 1, all the digits must be integers. Proof:
Let $A = \{(d↓k \ldotsm d↓0)↓b | d↓j \in D\}$, then [0, ∞) =
∪$↓{a|}¬A↓A (a + S)$ and $(a + S) ∩ (a↑\prime + S)$ has measure
zero for $a ≠ a↑\prime \in A$. We have (0, 1) \subset $S$, and
by induction on $m$ we will prove that $(m, m + 1) \subset a↓m
+ S$ for some $a↓m$. Let $a↓m$ be maximal in $S$ with $(m, m
+ ε)∩ (a↓m + S)$ of positive measure for all $ε > 0$. Then $a↓m
≤ m$, and $a↓m$ must be an integer lest $a↓|"l↓a|infinf m↓|"L
+ S$ overlap $a↓m + S$ too much. If $a↓m > 0$, the fact that
$m - a↓m, m - a↓m + 1) ∩ S$ is of positive measure implies by
induction that this measure is 1, and $(m, m + 1) \subset a↓m
+ S$ since $S$ is closed. If $a↓m = 0$ and $(m, m + 1) |spose
??\subset S$, the maximality of $q↓m$ shows that $m < a↑{↑\prime}↓{m}
< m + 1$ for some $a↑{↑\prime}↓{m} \in A$, where $(m, a↑{↑\prime}↓{m})
\subset S$; but then 1 + $S$ overlaps $a↑{↑\prime}↓{m} + S.]$
|qleft \qquad If we drop the restriction 0 \in $D$ there {\sl
are} many other cases, some of which are quite interesting,
especially the sets \{1, 2, 3, 4, 5, 6, 7, 8, 9, 10\}, \{1,
2, 3, 4, 5, 51, 52, 53, 54, 55\}, and \{2, 3, 4, 5, 6, 52, 53,
54, 55, 56\}. Alternatively if we allow negative digits we obtain
many other solutions by the method of exercise 19, plus further
sets like \{-1, 0, 1, 2, 3, 4, 5, 6, 7, 18\} which don't meet
the conditions of that exercise; it appears hopeless to find
a nice characterization of all solutions with negative digits.]
\ansno 25. A positive number whose base $b$ representation has
$m$ consectuve $(b - 1)$'s to the right of the decimal point
must have the form $c/b↑n + (b↑m - \theta )/b↑{n+m}$, where
0 < \theta ≤ 1 and $c, n$ are nonnegative integers. So if $u/v$
has this form, we find that $b↑{m+n}u = b↑mcv + b↑mv - \theta
v$. Therefore $\theta v$ is an integer which is a multiple of
$b↑m$. But $0 < \theta v ≤ v < b↑m$. (There can be arbitrarily
long runs of other digits {\sl dddddddd}, if $0 ≤ d < b - 1$,
for example in the representation of $d/(b - 1).\bigrp$
|qleft ???|newtype 58320---Computer Programming\quad (Knuth/
%folio 731 galley 2a (C) Addison-Wesley 1978 -
Addison-Wesley)\quad f. 731\quad Ans\quad g. 2c
\ansno 26. The proof of ``sufficiency'' is straightforward
generalization of the usual proof for base $b$, by successively
constructing the desired representation. The proof of ``necessity''
breaks into two parts: If $β↓{n?}β1$ is greater than $\sum ↓{k|}¬E↓n
c↓kβ↓k$ for some $n$, then $β↓{n+1} - ε$ has no representation
for small $ε$. If $β↓{n?}β1 ≤ \sum ↓{k|}¬E↓n c↓kβ↓k$ for all
$n$, but equality does not always hold, we can show there are
two representations for certain $x$. [See {\sl Transactions
of the Royal Society of Canada}, series III, {\bf 46} (1952),
45--55.]
\ansno 27. Proof by induction on $|n|:$ If $n$ is even we must
take $e↓0 > 0$, and tahe result then follows by induction, since
$n/2$ has a unique such representation. If $n$ is odd, we must
take $e↓0 = 0$, and the problem reduces to representing $-(n
- 1)/2$; the latter quantity is either zero or one, when there
is obviously only one way to proceed, or it has a unique reversing
representation by induction.
\ansno 28. A proof like that of exercise 27 may be given. Note
that $a ?? bi$ is a multiple of $1 + i$ by a complex integer
if and only if $a + b$ is even.
\ansno 29. It suffices kto prove that any collection $T↓0, T↓1,
T↓2, . . $. satisfying Property B may be obtained by collapsing
some sollcetion $S↓0, S↓1, S↓2, . . . $, where $S↓0 = \{0, 1,
\ldotss , b - 1\}$ and all elemenats of $S↓1, S↓2, . . $. are
multiples of $b$.
|qleft \qquad To prove the latter statement, we may assume that
1 \in $T↓0$ and that theare is a least element $b > 1$ such
that $b |spose ??\in T↓0$. We will prove, by induction on $n$,
that if $nb |spose ??\in T↓0$, then $nb ?? 1, bn ?? 2, . . .
, nb ?? b - 1$ are not in any of the $T↓$j's; but if $nb \in
T↓0$, then so are $nb + 1, . . . , nb + b - 1$. The result then
follows with $S↓1 = \{nb | nb \in T↓0\}, S↓2 = T↓1, S↓3 = T↓2$,
etc.
|qleft \qquad If $nb |spose ??\in T↓0$, then $nb = t↓0 + t↓1
\cdotss$, where $t↓1, t↓2, . . $. are multiples of $b$; hence
$t↓0 < nb$ is a multiple of $b$. By induction, $(t↓0 + k) +
t↓1 + t↓2 + \cdot \cdot \cdot$ is the representation of $nb
+ k$, for 0 < $k < b$; hence $nb + k |spose ??\in T↓j$ for any
$j$.
|qleft \qquad If $nb \in T↓0$ and 0 < $k < b$, let the representation
of $nb + k$ be $t↓0 + t↓1 + \cdotss$. We cannot have $t↓j =
nb + k$ for $j ≥ 1$, lest $nb + b$ have two representations
$(b - k) + \cdots + (nb + k) + \cdots = (nb) + \cdots + b +
\cdots $. By induction, $t↓0$ mod $b = k$; and the representation
$nb = (t↓0 - k) ?? t↓1 ?? \cdot \cdot \cdot$ implies that $t↓0
= nb + k$.
|qleft \qquad [Reference: {\sl Nieuw Archief voor Wiskunde (3)
\bf 4} (1956), 15--17. A finite analog of this result was derived
by P. A. MacMahon, {\sl Combinatory Analysis \bf 1} (1915),
217--223.]
\ansno 30. a) Let $A↓j$ be the set of numbers $n$ whose representation
does not involve $b↓j$; then by the uniqueness property, $n
\in A↓j$ iff $n + b↓j |spose ??\in A↓j$. Consequently $↓n \in
A↓j iff n + 2b↓j \in A↓j$. It follows that, for $j ≠ k, n \in
A↓j ∩ A↓k$ iff $n + 2b↓jb↓k \in A↓j ∩ A↓k$. Let $m$ be the number
of integers $n \in A↓j ∩ A↓k$ such that 0 ≤ $n < 2b↓jb↓k$. Then
there are exactly $m$ integers in the same range which are in
$A↓j$ but not $A↓k$, exactly $m$ in $A↓k$ but not $A↓j$, and
exactly $m$ in neither $A↓j$ nor $A↓k$; hence $4m = 2b↓jb↓k$.
Therefore $b↓j$ and $b↓k$ ckan??????d this interval. In this
range 3 → -1 → -2 → 6 → 8 → 2 → 2 → 7 → 0 and 4 → 1 → 5 → 6.
Thus 1 = ?? \cdot 2$↑0 - 13 \cdot 2↑1 + 7 \cdot 2↑2 - 13 \cdot
2↑3 - 13 \cdot 2↑5 - 13 \cdot 2↑9 + 7 \cdot 2↑{10}$.
|qleft \qquad {\sl Note{\sl :}} The choice $d↓0, d↓1, d↓2, \ldots
= 5$, $-3$, 3, 5, $-3$, 3, . . $. also yields a binary basis. For
further details see {\sl Math.\ Comp.\ \bf 18} (1964), 537--546;
A. D. Sands, {\sl Acta Mathematica}, Acad.\ Sci.\ Hung., {\bf 8}
(1957), 65--86.
\ansno 31. (See also the related exercises 3.2.2--11, 4.3.2--13,
4.6.2--22.)
|qleft \qquad a) By multiplying numerator andffd?????????\qquad
a) By multiplying numerator and denominator by suitable powers
of 2, we may assume that $u = (\ldotsm u↓2u↓1u↓0)↓2$ and $v =
(\ldotsm v↓2v↓1v↓0)↓2$, where $v↓0 = 1$. The following computational
method now determines $w$, using the notation $u↑{(n)}$ to stand
for the integer $(u↓{n-1} \ldotsm u↓0)↓2 = u$ mod $2↑n$ when
$n > 0:$
|qleft \qquad Let $w↓0 = u↓0$. For $n = 1, 2, . . . $, assume
that we have found an integer $w↑{(n)} = (w↓{n-1} \ldotsm w↓0)↓2$
such that $u↑{(n)} ≡ v↑{(n)}w↑{(n)}$ (modulo 2$↑n)$. Then $u↑{(n+1)}??444??????
= u↓0$. For $n = 1, 2, . . . $, assume that we have found an
integer $w↑{(n)} = (w↓{n-1} \ldotsm w↓0)↓2$ such that $u↑{(n)}
≡ v↑{(n)}w↑{(n)}$ (modulo 2$↑n)$. Then $u↑{(n+1)} ≡ v↑{(n+1)}w↑{(n)}$
(modulo 2$↑n)$, hence we may let $w↓n = 0$ or 1 according as
$(u↑{(n+1)} - v↑{(n+1)}w↑${(n)})mod 2$↑{n+1}$ is 0 or $2↑n$.
|qleft \qquad b) Find the smallest integer $k$ such that $2↑k
≡ 1$ (modulo $2n + 1)$. Then $1/(2n + 1) = m/(2↑k - 1)$ for
some integer $m, 1 ≤ m < 2↑{k-1}$. Let $α$ be the $k$-bit binary
representation of $m$; then $(0.ααα . . .)↓2$ times $2n + 1$
is (0.111 . . .)$↓2 = 1$ in the binary system, and (\ldotsm $ααα)↓2$
times $2n + 1$ is (\ldotsm 111)$↓2 = -1$ in the 2-adic system.
|qleft \qquad c) If $u$ is rational, say $u = m/2↑tn$ where
$n$ is odd and positive, the 2-adic representation of $u$ is
periodic, because the set of numbers with periodic expansions
includes $-1/n$ and is closed under the operations of negation,
division by 2, and addition.
|qleft \qquad d) The square of any number of the form (\ldotsm
$u↓2u↓11)↓2$ has the form (\ldotsm 001)$↓2, hence the condition
is necessary. To show the sufficiency, use the following procedure
to compute v = |newtype |newtype \sqrt{n}$ when $n$ mod 8 =
1:
$$|indent \qquad {\bf H1.} Set $n ← (n - 1)/8, k ← 2, v↓0 ←
1, v↓1 ← 0, v ← 1$.
|qleft \3skip \qquad {\bf H2.} If $n$ is even, set $v↓k ← 0,
n ← n/2$. Otherwise set $v↓k ← 1, n ← (n - v - 2↑{k-1})/2, v
← v + 2↑k$.
|qleft \qquad {\bf H3.} Increase $k$ by 1 and return to H2.
ansno 32. A generalization appears
in {\sl Math.\ Comp.\ \bf 29} (1975), 84--86.
%folio 733 galley 2b (C) Addison-Wesley 1978 -
|qleft \25skip |newtype {\:r SECTION 4.2.1
|qleft }\11skip |newtype {\bf 1.} $N = (62, +.60 22 52 00);
h = (37, +.10 54 50 00)$. Note that 10$h$ would be (38, +.01
05 45 00).
\ansno 2. $b↑{E-q}(1 - b↑{-p}), b↑{-q-p}; b↑{E-q}(1 - b↑{-p}),
b↑{-q-1}$.
\ansno 3. When $e$ does not have its smallest value,
the most significant ``one'' bit (which appears in all such
normalized numbers) need not appear in the computer word. [This
idea appeared in Zuse's machine in 1936.]
|qleft {\bf 4.} (51, +.10209877); (50, +.12346000); (53, +.99999999).
The third answer would be (54, +.10000000) if the first operand
had been (45, -.50000000).
\ansno 5. If $x ~ y$ and $m$ is an integer then $mb + x ~ mb
+ y$. Another crucial property is that $x/b$ and $y/b$ will
round to the same integer, whenever $x ~ y$.
|qleft \qquad Now if $b↑{-p-2}F↓v ≠ f↓v$ we must have $(b↑{p+2}fv)$mod
$b ≠ 0$; hence the transformation leaves $f↓v$ unchanged unless
$e↓u - e↓v ≥ 2$. Since $u$ was normalized, it is nonzero and
$|f↓u + f↓v| > b↑{-1} - b↑{-2} ≥ b↑{-2}:$ the leading nonzero
digit of $f↓u + f↓v$ must be at most two places to the right
of the radix point, and the rounding operation will convert
$b↑{p+j}(f↓u + f↓v)$ to an integer, where $j ≤ 1$. The proof
will be complete if we can show that $b↑{p+j+1}()f↓u + f↓v)
~ b↑{p+j+1}(f↓u + b↑{-p-2}F↓v)$. By the previous paragraph,
we have $b↑{p+2}(f↓u + f↓v) ~ b↑{p+2}f↓u + F↓v = b↑{p+2}(f↓u
+ b↑{-p-2}F↓v)$, which implies the desired result for all $j
≤ 1$.
|qleft \qquad Note that, when $b > 2$ is even, such an integer
$F↓v$ always exists; but when $b = 2$ we require $p + 3$ bits
(let $2F↓v$ be an integer). When $b$ is odd, an integer $F↓v$
always exists except in the case of division, when a remainder
of ${1\over 2}b$ is possible.
\ansno 6. (Consider the case $e↓u = e↓v, f↓u = -f↓v$ in Program
A.) Register A retains its previous sign, as in ADD.
\ansno 7. Say that a number is normalized iff it is zero or
its fraction part satisfies ${1\over 6}$ < |$f| < {1\over 2}$.
A $(p + 1)$-place accumulator suffices for addition and subtraction;
rounding (except during division) is equivalent to truncation.
A very pleasant system indeed! We might represent numbers with
excess-zero exponent, inserted between the first and subsequent
digits of the fraction, and complemented if the fraction is
negative, so that fixed-point order is preserved.
\ansno 8. (a) (06, +.12345679) \oplus (06, -.12345678), (01,
+.10345678) \oplus (00, -.94000000);\xskip (b) (99, +.87654321) \oplus
itself, (99, +.99999999) \oplus (91, +.50000000).
\ansno 9. $a = (-50, +.10000000)$, $b = (-41, +.20000000)$, $c =
a$, $d = (-41, +.80000000)$, $y = (11, +.10000000)$.
\ansno 10. (50, +.99999000) \oplus (55, +.99999000).
|qleft β????????????|newtype 58320---Computer Programming\quad
%folio 735 galley 3a (C) Addison-Wesley 1978 -
(Knuth/Addison-Wesley)\quad f. 735\quad Ans\quad g. 3c
$${\bf 11.} (50, +.10000001) \oplus (50, +.99999990).
\ansno 12. If 0 < $|f↓u| < |f↓v|$, then $|f↓u| ≤ |f↓v| - b↑{-p}$;
hence $1/b < |f↓u/f↓v| ≤ 1 - b↑{-p}/|f↓v| < 1 - b↑{-p}$. If
0 < |f$↓v| ≤ |f↓u|$, we have $1/b < |f↓u/f↓v|/b ≤ (1 - b↑{-p})/(1/b)/b
= 1 - b↑{-p}$.
\ansno 13. See J. Michael Yohe, {\sl IEEE Transactions} {\bf C-22}
(1973), 577--586.
\ansno 14. |tab FIX\quad |tab STJ \quad |tab 9F\qquad \qquad
|tab Float-to-fix subroutine:⊗⊗⊗STA⊗TEMP\cr
⊗⊗LD1⊗TEMP(EXP)⊗rI1 ← $e.\cr$
⊗⊗SLA⊗1⊗rA ← \pm $ffff0.\cr
⊗⊗JAZ⊗9F⊗$Is input zero?\cr
⊗⊗DEC1⊗1\cr
⊗⊗CMPA⊗=0=(1:1)⊗If leading byte is zero,\cr
⊗⊗JE⊗??-4⊗\quad shift left again.\cr
⊗⊗ENN1⊗-Q-4,1\cr
⊗⊗J1N⊗FIXOVFLO⊗Is magnitude too large?\cr
⊗⊗ENTX⊗0\cr
⊗⊗SRAX⊗0,1\cr
⊗⊗CMPX⊗=1/ /2=\cr
⊗⊗JL⊗9F\cr
⊗⊗JG⊗??+2\cr
⊗⊗JAO⊗9F\cr
⊗⊗STA⊗??+1(0??0)⊗Round, if necessary.\cr
⊗⊗INCA⊗1⊗Add \pm 1 (overflow is impossible).\cr
⊗9H⊗JMP⊗??⊗Exit from subroutine.\cr
\3skip {\bf 15.} |tab \parallel ?? |tab STJ |tab EXITF\qquad
\qquad \qquad |tab Fractional part subroutine:⊗⊗⊗JOV⊗OFLO⊗Ensure
overflow is off.\cr
⊗⊗STA⊗TEMP⊗TEM???????????¬sTA⊗TEMP⊗TEMP ← $u.\cr
⊗⊗$ENTX⊗0\cr
⊗⊗SLA⊗1⊗rA ← $f↓u.\cr
⊗⊗LDS⊗TEMP(EXP)⊗$rI2 ← $e↓u.\cr
⊗⊗$DEC2⊗Q\cr
⊗⊗J2NP⊗??+3\cr
⊗⊗SLA⊗0,2⊗Remove integer part of $u.\cr
⊗⊗$ENT2⊗0\cr
⊗⊗JANN⊗1F\cr
⊗⊗ENN2⊗0,2⊗Fraction is negative: find\cr
⊗⊗SRAX⊗0,2⊗\quad its complement.\cr
⊗⊗ENT2⊗0\cr
⊗⊗JAZ⊗??+2\cr
⊗⊗INCA⊗1\cr
⊗⊗ADD⊗WM1⊗Add word size minus one.\cr
⊗1H⊗INC2⊗Q⊗Prepare to normalize the answer.\cr
⊗⊗JMP⊗NORM⊗Normalize, round, and exit.\cr
⊗8H⊗EQU⊗1(1:1)\cr
⊗WM1⊗CON⊗8B-1,8B-1(1:4)⊗Word size minus one\cr
\3skip {\bf 16.} If $|c| ≥ |d|$, then set $r ← d \odiv c,
s ← c \oplus (r \otimes d); x ← \biglp a \oplus (b \otimes r)\bigrp
\odiv s; y ← \biglp b \ominus (a \otimes r)\bigrp \odiv
s$. Otherwise set $r ← c \odiv d, s ← d \oplus (r \otimes
c); x ← \biglp (a \otimes r) \oplus b\bigrp \odiv s, y ← \biglp
(b \otimes r) \ominus a\bigrp \odiv s$. Then $x + iy$ is the
desired approximation to $(a + bi)/(c + di). [{\sl CACM \bf 5} (1963),
435.] Other algorithms for complex arithmetic and function evaluation
are given by P. Wynn, {\sl BIT \bf 2} (1962), 232--255; see
also Paul Friedland, {\sl CACM \bf 10} (1967), 665.
\ansno 17. See Robert Morris, {\sl IEEE Transactions} {\bf C-20}
(1971), 1578--1579. Error analysis is more difficult with such
systems, so internal arithmetic is correspondingly more desirable.
\ansno 18. For positive numbers: shift fraction left until $f↓1
= 1$, then round, then if the fraction is zero (rounding overflow)
shift it right again. For negative numbers: shift fraction left
until $f↓1 = 0$, then round, then if the fraction is zero (rounding
underflow) shift it right again.
\ansno 19. $\biglp 43 + (1 if $e↓v < e↓u) - (1$ if fraction overflow)
- (10 if result zero) + (4 if magnitude
is rounded up) + (1 if first rounding digit is $b/2) + (5$ if
rounding digits are $W↓2 0 \ldotsm 0) + (7$ if rounding overflow)
+ 7$N + A(-1 + (11$ if $N > 0))\bigrp u$, where $N$ is the number
of left shifts during normalization, $A = 1$ if rX receives
nonzero digits (otherwise $A = 0)$. The maximum time of $73u$
occurs for example when $u = ↓+50 01 00 00 00$, $v = -46 49 99
99 99, b = 100$. [The average time, considering the statistical
data in Section 4.2.4, will be about 45${1\over 2}$$u.]$
%folio 738 galley 3b (C) Addison-Wesley 1978 -
|qleft \3skip \22skip |newtype {\:r SECTION 4.2.2
|qleft }\11skip |newtype {\bf 1.} $u \ominus v = u \oplus -v
= -v \oplus u = -(v \oplus -u) = -(v \ominus u)$.
\ansno 2. u \oplus x ≥ u \oplus 0 = u, by (8), (2), (6); hence
by (8) again, $(u \oplus x) \oplus v ≥ u \oplus v$. Similarly,
(8) and (6) together with (2) imply that $(u \oplus x) \oplus
(v \oplus y) ≥ (u \oplus x) \oplus v$.
\ansno 3. $u = 8.0000001, v = 1.2500008, w = 8.0000008;
(u \otimes v) \otimes w = 80.000064, u \otimes (v \otimes w)=
80.000057$.
\ansno 4. Yes, let 1/$u \approx v = w$ where $v$
is large.
\ansno 5. Not always; in decimal arithmetic take $u = v = 9$.
\ansno 6. (a) Yes.\xskip (b) Only for $b + p ≤ 4$ (try
$u = 1 - b↑{-p})$. [W. M. Kahan observes that the identity {\sl
does} hold whenever $b↑{-1} ≤ f↓u ≤ b↑{-1/2}$. It follows that
1 \odiv \biglp 1 \odiv (1 \odiv $u)\bigrp = 1 \odiv
u$ for all $u.]$
\ansno 7. If $u$ and $v$ are consecutive floating-binary numbers,
$u \oplus v = 2u$ or $2v$. When it is $2v$ we often ha¬e $u|spose
↑|??2`↑2 \oplus v|spose ↑|??2`↑2 < 2v|spose ↑|??2`↑2$. For example,
$u = .10 \ldotsm 001, v = .10 \ldotsm 010, u \oplus v = 2v, u|spose
↑|??2`↑2 \oplus v|spose ↑|??2`↑2 = .10 \ldotsm 011$.
\ansno 8. (a) ~, \approx ;\xskip (b) ~, \approx ;\xskip (c)
~, ??\xskip (d) ~;\xskip (e) ~.
\ansno 9. $|u - w| ≤ |u - v| + |v - w| ≤ ε↓1$ min$(b↑e|infsup
u↑{-q}, b↓e|infinf v↓{-q}) + ε↓2$ min$(b↓e|infinf v↓{-q}, b↓e|infinf
w↓{-q}) ≤ ε↓1b↓e|infinf u↑{-q} ?? ε↓2b↑e|infsup w↑{-q} ≤ (ε↓1
+ ε↓2)$max$(b↑e|infsup u↑{-q}, c↑e|infsup w↑{-q})$. The result
cannot be strengthened in general, since for example we might
have $e↓u$ very small compared to both $e↓v$ and $e↓w$, and
this means $u - w$ might be fairly large under kthe hypotheses.
\ansno 10. If $b > 2$, we have $.a↓1 \ldotsm a↓{p-1}a↓p \otimes
.9 \ldotsm 99 = .a↓1 \ldotsm a↓{p-1}(a↓p - 1)$ if we take $a↓p
≥ 2$; here ``9'' stands for $b - 1$. Furthermore, $.a↓1 \ldotsm
a↓{p-1}a↓p \otimes 1.0 \ldotsm 0 = .a↓1 \ldotsm a↓{p-1}0$, so
the multiplication is not monotone. But when $b = 2$, this argument
can be extended to show that multiplication $is$ monotone; obviously
the ``certain computer'' had $b > 2$.
\ansno 11. Without loss of generality, let $x$ be an integer,
$|x| < b↑p$. If $e ≤ 0$ then $t = 0$. If $0 < e ≤ p$ then $x
- t$ has at most $p + 1$ digits, the least significant being
zero. If $e > p$ then $x - t = 0$. [The result holds also under
the weaker hypothesis $|t| < 2b↑e.]$
\ansno 12. Assume that $e↓u = p, e↓v ≤ 0, u > 0$. Case 1, $u
> b↑{p-1}$. Case (1b), $w = u, |v| ≤ {1\over 2}$. Then $u↑\prime
= u, v↑\prime = 0, u\dprime = u, v\dprime = 0$. If $|v| = {1\over
2}$ and more general rounding is permitted we might also have
$u↑\prime = u \pm 1$. Case (1c), $w = u - 1, v ≤ -{1\over 2},
e↓v = 0$. Then $u↑\prime = u$ or $u - 1, v↑\prime = -1, u\dprime
= u, v\dprime = -1$ or 0. Case 2, $u = b↑{p-1}$. Case (2a),
$w = u + 1, v ≥ {1\over 2}, e↓v = 0$. Like (1a). Case (2b),
$w = u, |v| E {1\over 2}, u↑\prime ≥ u$. Like (1b). Case (2c),
$w = u, |v| ≤ {1\over 2}, u↑\prime < u$. Then $u↑\prime = u
- j/b$ where $v = j/b + v↓1$ and $|v↓1| ≤ {1\over 2b↑{-1}$ for
some positive integer $j ≤ {1\over 2}b$; we have $v↑\prime =
0, u\dprime = u, v\dprime = j/b$. Case (2d), $w < u$. Then $w
= u - j/b$ where $v = -j/b + v↓1$ and $|v↓1| ≤ {1\over 2}b↑{-1}$
for some positive integer $j ≤ b$; we have $(v↑\prime , u\dprime
) = (-j/b, u)$, and $(u↑\prime , v\dprime ) = (u, -j/b)$ or
$(u - 1/b, (1 - j)/b\bigrp $, the latter case only when $v↓1
= {1\over 2}b↑{-1}$. In all cases $u \ominus u↑\prime = u -
u↑\prime , v \ominus v↑\prime = v - v↑\prime , u \ominus u↑\prime
= u - u\dprime , v \ominus v\dprime = v - v\dprime $, round$(w
- u - v) = w - u - v$.
\ansno 13. Since round$(x) = 0$ iff $x = 0$, we want to determine
which integers $m, n$ have the property that $m \odiv n$ is
an integer iff $m/n$ is. Assume that $|m|, |n| < b↑p$. If $m/n$
is an integer then $m \odiv n = m/n$ is also. Conversely if
$m/n$ is not an integer, but $m\odiv n$ is, we have $1/|n|
≤ |m\odiv n - m/n| ≤ {1\over 2} |m/n| b↑{1-p}$, hence $|m|
≥ 2b↑{p-1}$. Our answer is therefore to require $|m| < 2b↑{p-1}$
and $0 < |n| < b↑p$. (Slightly weaker hypotheses are also possible.)
\ansno 14. $|(u \otimes v) \otimes w - uvw| ≤ |(u \otimes v)
\otimes w - (u \otimes v)w| + |w| |u \otimes v - uv| ≤ \delta
↓{(u|}↔N↓{v)|}↔N↓w + b↑e|infsup w↑{-q-1}|infsup w\delta ↓{u|}↔N↓v
≤ (1 + b) \delta ↓{(u|}↔N↓{v)|}↔N↓w$. Now $|e↓{(u|}↔N↓{v)|}↔N↓w
- e↓{u|}↔N↓{(v|}↔N↓w| ≤ 2$, so we may take $ε = {1\over 2}(1
+ b)b↑{2-p}$.
\ansno 15. $u ≤ v$ implies that $(u \oplus u) \odiv 2 ≤ (u
\oplus v) \odiv 2 ≤ (v \oplus v) \odiv 2$, so the condition
holds for all $u$ and $v$ iff it holds whenever $u = v$. For
base $b = 2$, the condition is therefore always satisfied (barring
overflow); but for $b > 2$ there are numbers $v ≠ w$ such that
$v \oplus v = w \oplus w$, hence the condition fails. [On the
other hand, the formula $u \oplus \biglp (v \ominus u) \odiv
2\bigrp$ does give a midpoint in the correct range. {\sl Proof{\sl
:}} It suffices to show that $u + (v \ominus u) \odiv 2 ≤
v$, i.e., $(v \ominus u) \odiv 2 ≤ v - u$; and it is easy
to verify that round\biglp ${1\over 2}$ round$(x)\bigrp ≤ x$
for all $x ≥ 0.]$
\ansno 16. (a) Exponent changes occur at \sum $↓{10} = 11.111111,
\sum ↓{91} = 101.11111, \sum ↓{901} = 1001.1102, \sum ↓{9001}
= 10001.020, \sum ↓{90009} = 100000.91, \sum ↓{900819} = 1000000.0;
\sum ↓{1000000} = 1109099.1.\xskip (b) \sum ↓{1≤k≤n} 1.2345679
= 1224782.1$, and (14) tries to take the square root of -.0053187053.
But (15) and (16) are exact in this case. [If $x↓k = 1 + \lfloor
(k - 1)/2\rfloor 10↑{-7}$, (15) and (16) have errors of order
{\sl n.]\xskip (c) Basically we need to show that $u \oplus (v \ominus
u) \odiv k$ lies between $u$ and $v$; see exercise 15.
|qleft \3skip
|qleft {\bf 17.} ⊗FCMP⊗STJ⊗9F⊗Floating point comparison subroutine:\cr
⊗⊗JOV⊗OFLO⊗Ensure overflow is off.\cr
⊗⊗STA⊗TEMP\cr
⊗⊗LDAN⊗TEMP⊗$v ← -v$.\cr
\quad (Copy here lines 07-20 of Program 4.2.1A.)
|qleft ⊗⊗LDX⊗FV(0:0)⊗Set rX to zero with sign of $f↓v$.\cr
⊗⊗DEC1⊗5\cr
J1N⊗??+2\cr
⊗⊗ENT1⊗0⊗Replace large difference in exponents\cr
⊗⊗SRAX⊗5,1⊗\quad by a smaller one.\cr
⊗⊗ADD⊗FU⊗rA ← difference of operands.\cr
⊗⊗JOV⊗7F⊗Fraction overflow: not ~.\cr
⊗⊗CMPA⊗EPSILON(1:5)\cr
⊗⊗JG⊗8F⊗Jump if not ~.\cr
⊗⊗JL⊗6F⊗Jump if ~.\cr
⊗⊗JXZ⊗9F⊗Jump if ~.\cr
⊗⊗JXP⊗1F⊗If |rA| = $ε$, check sign of rA \times rX.\cr
⊗⊗JAP⊗9F⊗Jump if ~. (rA ≠ 0)\cr
⊗⊗JMP⊗8F\cr
⊗7H⊗ENTX⊗1\cr
⊗⊗SRC⊗1⊗Make rA nonzero with same sign.\cr
⊗⊗JMP⊗8F\cr
⊗1H⊗JAP⊗8F⊗Jump if not ~. (rA ≠ 0)\cr
⊗6H⊗ENTA⊗0\cr
⊗8H⊗CMPA⊗=0=⊗Set comparison indicator.\cr
⊗9H⊗JMP⊗??⊗Exit from subroutine.\cr
β??????|newtype 58320---Computer
%folio 742 galley 4 (C) Addison-Wesley 1978 -
Programming\quad (Knuth/Addison-Wesley)\quad f. 742\quad Ans\quad
g. 4c
$${\bf 19. 19.} Let $\gamma ↓k = \delta ↓k = \eta ↓k = \sigma
↓k = 0$ for $k > n$. It suffices to find the coefficient of
$x↓1$, since the coefficient of $x↓k$ will be the same except
with all subscripts increased by $k - 1$. Let $(f↓k, g↓k)$ denote
the coefficient of $x↓1$ in $(s↓k - c↓k, c↓k)$ respectively.
Then $f↓1 = (1 + \eta ↓1)(1 - \gamma ↓1 - \gamma ↓1\delta ↓1
- \gamma ↓1\sigma ↓1 - \delta ↓1\sigma ↓1 - \gamma ↓1 \delta
↓1\sigma ↓1), g↓1 = (1 + \delta ↓1)(1 + \eta ↓1)(\gamma ↓1 +
\sigma ↓1 + \gamma ↓1\sigma ↓1)$, and $f↓k = (1 - \gamma ↓k\sigma
↓k - \delta ↓k\sigma ↓k - \gamma ↓k\delta ↓k\sigma ↓k)f↓{k-1}
+ (\gamma ↓k - \eta ↓k + \gamma ↓k \delta ↓k + \gamma ↓k\eta
↓k + \gamma ↓k \delta ↓k\eta ↓k + \gamma ↓k\eta ↓k\sigma ↓k
?? \delta ↓k\eta ↓k\sigma ↓k + \gamma ↓k \delta ↓k\eta ↓k\sigma
↓k)g↓{k-1}, g↓k = \sigma ↓k(1 + \gamma ↓k)(1 + \delta ↓k)f↓{k-1}
- (1 + \delta ↓k(\gamma ↓k + \gamma ↓k\eta ↓k + \eta ↓k\sigma
↓k + \gamma ↓k\eta ↓k\sigma ↓k)g↓{k-1}$, for $1 < k ≤ n$. Thus
$f↓n = 1 + \eta ↓1 - \gamma ↓1 + (4n$ terms of 2nd order) +
(higher order terms) = 1 + $\eta ↓1 - \gamma ↓1 + O(nε↑2)$ is
sufficiently small. [The Kahan summation formula was first published
in {\sl CACM \bf 8} (1965), 40; cf.\ {\sl Proc.\ IFIP Congress
(1971), \bf 2}, 1232. For another approach to accurate summation,
see R. J. Hanson, {\sl CACM \bf 18} (1975), 57--58.]
\ansno 20. By the proof of Theorem C, (47) fails for $e↓w =
p$ only if $|v| + {1\over 2} ≤ \parallel w - u| ≥ b↑{p-1} +
b↑{-1}$; hence $|f↓u| ≥ |f↓v| ≥ 1 - ({1\over 2}b - 1)b↑{-p}$.
This rather rare case, in which $|f↓w|$ before normalization
takes its maximum value 2, is necessary and sufficient.
\ansno 21. (Solution by G. W. Veltkamp.) Let $c = 2↑|"p↑{p/2|}"P
\tau 1$; we may assume that $p ≤ 2$, so $c$ is redpresentable.
First compute $u↑\prime = u \otimes c, u↓1 = (u \ominus u↑\prime
) \oplus u↑\prime , u↓2 = u \ominus u↓1; v↑\prime = v \otimes
c, v↓1 = (v \ominus v↑\prime ) \oplus v↑\prime , v↓2 = v \ominus
v↓1$. Then set $w ← u \otimes v, w↑\prime ← \biglp ((u↓1 \otimes
v↓1 B w) \oplus (u↓1 \otimes v↓2)) \oplus (u↓2 \otimes v↓1)\bigrp
\oplus (u↓2 \otimes v↓2)$.
|qleft \qquad It suffices to prove this when $u, v > 0$ and
$e↓u = e↓u = p$, ao that $u$ and $v$ are integers \in [$2↑{p-1},
2↑p)$. Then $u = u↓1 + u↓2$ where $2↑{p-1} ≤ u↓1 ≤ 2↑p, u↓1$
mod $2↑|"p↑{p/2}\rceil = 0$, and $|u↓2| ≤ 2↑|"p↑{p/2|}"P↑{-1}$;
similarly $v = v↓1 ?? v↓2$. The operations during the calculation
of $w↑\prime$ are exact, because $w - u↓1v↓1$ is a multiple
of 2$↑{p-1}$ with $|w - u↓1v↓1| ≤ |w - uv| + |u↓2v↓1 + u↓1v↓2
\tau u↓2v↓2| ≤ 2↑{p-1} ?? 2↑{p+|}"p↑{p/2|}"P \tau 2↑{p-1}$;
and $|w - u↓1v↓1 - u↓1v↓2| ≤ |w - uv| ?? |u↓2v| < 2↑{p-1} +
2↑|"p↑{p/2|}"P↑{-1+p}$, where $w - u↓1v↓1 - u↓1v↓2$ is a multiple
of $2↑|"p↑{p/2|}"P$.
|qleft {\bf 22.} We may assume that $b↑{p-1} ≤ u, v < b↑p$.
If $uv ≤ b↑{2p-1}$ then $x↓1 = uv - r$ where $|r| ≤ {1\over
2}b↑{p-1}$, hence $x↓2 =$ round$(u - r/v) = x↓0$ (since $|r/v|
≤ {1\over 2}b↑{p-1}/b↑{p-1} ≤ {1\over 2}$, and equality implies
$v = b↑{p-1}$ hence $r = 0)$. If $uv > b↑{2p-1}$ then $x↓1 =
uv - r$ where $GrG ≤ {1\over 2}b↑p$, hence $x↓1/v ≤ u - r/v
< b↑p + {1\over 2}b$ and $x↓2 ≤ b↑p$. If $x↓2 = b↑p$ then $x↓3
= x↓1$ (for otherwise $b↑p - {1\over 2})v ≤ x↓1 ≤ b↑p(v - 1))$.
If $x↓2 < b↑p$ and $x↓1 > b↑{2p-1}$ then let $x↓2 = x↓1/v +
q$ where $|q| ≤ {1\over 2}$; we have $x↓3 =$ round$(x↓1 \tau
qv) = x↓1$. Finally if $x↓2 < b↑p$ and $x↓1 = b↑{2p-1}$ and
$x↓3 < b↑{2p-1}$ then $x↓4 = x↓2$ by the first case above. This
situation arises e.g. for $b = 10, p = 2, u = 19, v = 55, x↓1
= 1000, x↓2 = 18, x↓3 = 990$.
|qleft \9skip {\bf 23.} Let $\lfloor u\rfloor = n$, so that
$u$ mod 1 = $u \ominus n = u - n + r$ where $|r| ≤ {1\over 2}b↑{-p}$;
we wish to show that round$(n - r) = n$. The result is clear
if $|n| > 1$; and $r = 0$ when $n = 0$ or 1, so the only subtle
case is when $n = -1$, $r = {1\over 2}b↑{-p}$. The identity fails
iff $b$ is a multiple of 4 and $-b↑{-1} < u < -b↑{-2}$ and $u$
mod $2b↑{-p} = {3\over 2}b↑{-p}$ (e.g., $p = 3, b = 8, u = -.0124)$.
\ansno 24. Let $u = [a, b], v = [c, d]$. Then $u \oplus v =
[a |newtype ??2ffl|newtype c, b |newtype ??2ffi|newtype d]$
where |newtype ??2ffi|newtype is commutative, $x |newtype ??2ffi|newtype
10 = x$ for all $x, x |newtype ??2ffi|newtype - 0 = x$ for all
$x ≠ +0, x |newtype ??2ffi|newtype +∞ = +∞$ for all $x$, and
$x |newtype ??2ffi|newtype -∞$ need'nt be defined; $x |newtype
??2ffl|newtype y = -\biglp (-x) |newtype ??2ffi|newtype (-y)\bigrp
$. Also $u \ominus v = u \oplus (-v)$ where $-v = [-d, -c]$.
|qleft \qquad Let $u \otimes v = [$min($a |newtype ??2ffl|newtype
c, a |newtype ??2ffl|newtype d, b |newtype ??2ffl|newtype c,
b |newtype ??2ffl|newtype d)$, max($a |newtype ??2ffi|newtype
c, a |newtype ??2ffi|newtype d, b |newtype ??2ffi|newtype c,
b |newtype ??2ffi|newtype d)]$, where $x |newtype ??2ffi|newtype
y = y |newtype ??2ffi|newtype x, x |newtype ??2ffi|newtype (-y)
= -(x |newtype ??2ffl|newtype y) = (-x) |newtype ??2ffi|newtype
y; x |newtype ??2ffi|newtype +0 = +0$ for $x > 0, -0$ for $x
< 0; x |newtype ??2ffi|newtype -0 = -(x |newtype ??2ffi|newtype
+0); x |newtype ??2ffi|newtype +∞ = +∞$ for $x > +0, -∞$ for
$x < ↓-0$. (It is possible to determine the min and max simply
by looking at the signs of $a, b, c, d$, thereby computing only
two of the eight products, except when $a < 0 < b$ and $c <
0 < d$; in the latter case we compute four products and the
answer is [min$(a |newtype ??2ffl|newtype d, b |newtype ??2ffl|newtype
c)$, max$(a |newtype ??2ffi|newtype c, b |newtype ??2ffi|newtype
d)].)$
|qleft \qquad Finally, $u \odiv v$ is undefined if $c < 0
< d$; otherwise we use the formulas for multiplication with
$c, d$ replaced by $d↑{-1}, c↑{-1}$, where $x |newtype ??2ffi|newtype
y↑{-1} = x |newtype ??2ffi|newtype y, x |newtype ??2ffl|newtype
y↑{-1} = x |newtype ??2ffl|newtype y, (\pm 0)↑{-1} = \pm ∞,
(\pm ∞)↑{-1} = \pm 0$.
\ansno 25. Cancellation reveals {\sl previous} errors in the
computation of $u$ and $v$. For example, if $εs$is small, we
often get poor accuracy when computing $f(x + ε) \ominus f(x)$,
because the rounded calculation of $f(x + ε)$ destroys much
of the information about $ε$. It is often desirable to reunite
such formulas as $ε \otimes g(x, ε)$, where $g(x, ε) = \biglp
f(x + ε) - f(x)|newtype )/ε$ is first computed symbolically.
Thus, if $f(x) = x↑2$ then $g(x, ε) = 2x + ε$; if $f(x) = |newtype
|newtype \sqrt{x}$ then $g(x, ε) = 1/(|newtype |newtype \sqrt{x
+ ε} + |newtype |newtype \sqrt{x})$.
%folio 745 galley 4b (C) Addison-Wesley 1978 -
|qleft \3skip \22skip |newtype {\:r SECTION 4.2.3
|qleft }\11skip |newtype {\bf 1.} First, $(w↓m, w↓i) = (.573,
.248)$; then $w↓mv↓l/v↓m = .290$; so the answer is (.572, .958).
This in fact is the correct result to six decimals.
\ansno 2. The answer is not affected, since the normalization
routine truncates to eight places and can never look at this
particular byte position. (Scaling to the left occurs at most
once during normalization, since the inputs are normalized.)
\ansno 3. Overflow obviously cannot occur at line 09, since
we are adding two-byte quantities, or at line 22, since we are
adding four-byte quantities. In line 30 we are computing the
sum of three four-byte quantities, so this cannot overflow.
Finally, in line 32, overflow is impossible because the product
$f↓uf↓v$ must be less than unity.
\ansno 4. Insert ``JOV OFLO; ENT1 0'' between lines 03 and 04.
Replace lines 21--22 by ``ADD TEMP(ABS); JNOV ??+2; INC1 1'',
and change lines 28--31 to ``SLAX 5; ADD TEMP; JNOV ??+2; INC1
1; ENTX 0, 2; SRC 5''. This odds five lines of code and only
1, 2, or 3 units of execution time.
\ansno 5. Insert ``JOV OFLO'' after line 06. Change lines 22,
31, 39 respectively to SRAX 0, 1; SLAX 5; ADD ACC. Between lines
40 and 41, insert ``DEC2 1; JNOV DNORM; INC2 1; INCX 1; SRC
;''. [It's tempting to remove the DEC 1 in factor of STZ EXPO,
but then INC2 1 might overflow vI2!] This adds six lines of
code; the time {\sl decreases b}by $3u$ unless there is fraction
overflow, when it increases by $7u$.
\ansno 6.|zfa
|qleft
|qleft |tab DOUBLE⊗STJ⊗EXITDF⊗Convert to double precision:\cr
⊗⊗ENTX0⊗Clear rX.\cr
⊗⊗STA⊗TEMP\cr
⊗⊗LD2⊗TEMP(EXP)⊗rI2 ← $e.\cr
⊗⊗$INC2⊗QQ-Q⊗Correct for difference in excess.\cr
⊗⊗STZ⊗EXPO⊗EXPO ← 0.\cr
⊗⊗SLAX⊗1⊗Remove exponent.\cr
⊗⊗JMP⊗DNORM⊗Normalize and exit.\cr
⊗SINGLE⊗STJ⊗EXITF⊗Convert to single precision:\cr
⊗⊗JOV⊗OFLO⊗Ensure overflow is off.\cr
⊗⊗STA⊗TEMP\cr
⊗⊗LD2⊗TEMP(EXPD)⊗rI2 ← $e.\cr$
⊗⊗DEC2QQ-Q⊗Correct for difference in excess.\cr
⊗⊗SLAX⊗2⊗Remove exponent.\cr
⊗⊗JMP⊗NORM⊗Normalize, round, and exit.\cr
\3skip {\bf 7.} All three routines give zero as the answer if
and only if the exact result would be zero, so we need not worry
about zero denominators in the expressions for relative error.
The worst case of the addition rotuine is pretty bad: Visualized
in decimal notation, if the inputs are 1.0000000 and .99999999,
the answer is $b↑{-7}$ instead of $b↑{-8}:$ thus the maximum
relative error $\delta ↓1$ is $b - 1$, where $b$ is the byte
size.
|qleft For multiplication and division, we may assume that the
exponents of both operands are QQ, and that both operands are
positive. The maximum error in multiplication is readily bounded
by considering Fig.\ 4: When $uv ≥ 1/b$, we have 0 ≤ uv - u \otimes
v < 3b$↑{-9} + (b - 1)b↑{-9}$, so the relative error is bounded
by $(b ?? 2)b↑{-8}$. When 1/$b↑2 ≤ uv < 1/b$, we have 0 ≤ uv
- u \otimes v < 3b↑{-9}$, so the relative error in this case
is bounded by $3b↑{-9}/uv ≤ 3b↑{-7}$. We take $\delta ↓2$ to
be the larger of the two estimates, namely $3b↑{-7}$.
|qleft \qquad Division requires a more careful analysis of Program
D. The quantity actually computed by the subroutine is $α -
\delta - bε\biglp (α - \delta \dprime )(β - \delta ↑\prime )
- \delta \bigrp - \delta ↓n$ where $α = (u↓m ?? εu↓l)/bv↓m,
β = v↓l/bv↓m$, and $\delta , \delta ↑\prime , \delta \dprime
, \delta , \delta ↓n$ are nonnegative truncation errors with
$\delta < b↑{-10}, \delta ↑\prime < b↑{-5}, \delta \dprime <
b↑{-5}, \delta < b↑{-6}$, and $\delta ↓n$ (which occurs during
normalization) is either less than $b↑{-9}$ or $b↑{-8}$ depending
on whether scaling occurs or not. The actual value of the quotient
is $α/(1 ?? bεβ) = α - bεαβ ?? b↑2αβ↑2\delta \dprime \dprime
$, where $\delta \dprime \dprime$ is the nonnegative error due
to truncatioon of the infinite series (3); $\delta \dprime \dprime
< ε↑2$, since it is an alternating series. The relative error
is therefore the absolute value of $(bε\delta ↑\prime ?? bε\delta
\dprime β/α ?? bε\delta /α) - (\delta /α ?? bε\delta ↑\prime
\delta \dprime /α ?? b↑2β↑2\delta \dprime \dprime + \delta ↓n/α)$,
times $(1 ?? bεβ)$. The positive terms in this expression are
bounded by $b↑{-9} ?? b↑{-8} ?? b↑{-8}$, and the negative terms
are bounded by $b↑{-8} ?? b↑{-12} ?? b↑{-8}$ plus the contribution
by the normalizing phase, which can be about $b↑{-7}$ in magnitude.
It is therefore clear that the potentially greatest part of
the relative error comes during the normalization phase, and
that $\delta ↓3 = (b ?? 2)b↑{-8}$ is a safe upper bound for
the relative error.
\ansno 8. Addition: If $e↓u ≤ e↓v ?? 1$, the entire relative
error occurs during the normalization phase, so it is bounded
above by $b↑{-7}$. If $e↓u ≥ e↓v + 2$, and it the signs are
the same, again the entire error may be ascribed to normalization;
if the signs are opposite, the error due to shifting digits
out of the register is in the opposite direction from the subsequent
error introduced during normalization. Both of these errors
are bounded by $b↑{-7}$, hence $\delta ↓1 = b↑{-7}$. (This is
substantially better then the result in exercise 7.)
|qleft \qquad Multiplication: An analysis as in |newtype 58320---Computer
Programming\quad (Knuth/Addi
%folio 748 galley 5 (C) Addison-Wesley 1978 -
son-Wesley)\quad Ans\quad f. 748\quad g. 5
|qleft \23skip |newtype {\:r SECTION 4.2.4
|qleft }\11skip |newtype {\bf 1.} Since fraction overflow can
occur only when the operands have the same sign, this is the
probability that fraction overflow occurs divided by the probability
that the operands have the same sign, namely, 7%/$-{1\over 2}$(91%)
\approx 15%.
\ansno 2. log$↓{10} 2.3 - log↓{10} 2.2 \approx 1.930516%$.
\ansno 4. The pages would be uniformly gray (same as ``random
point on a slide rule'').
\ansno 5. The probability that $10f↓U ≤ r$ is $(r - 1)/10 ??
(r - 1)/100 + \cdots = (r - 1)/9$. So in this case the leading
digits are {\sl uniformly} distributed, e.g., leading digit
1 occurs with probability ${1\over 9}$.
\ansno 6. The probability that there are three leading zero
bits is log$↓{16} 2 = {1\over 4}; the probability that there
are two leading zero bits is log↓{16} 4 - log↓{16} 2 = {1\over
4}; and similarly for the other two cases. The ``average'' number
of leading zero bits is 1{1\over 2}, so the ``average'' number
of ``significant bits'' is p ?? {1\over 2}$. The worst case,
$p - 1$ bits, occurs however with rather high probability. In
practice, it is usually necessary to base error estimates on
the worst case; in the error analysis of Section 4.2.2, the
upper bound on relative rounding error for floating-hex is 2$↑{1-p}$.
In the binary case we can have $p + 1$ significant bits at all
times (cf.\ exercise 4.2.1--3), with relative rounding errors
bounded by 2$↑{-1-p}$. Extensive computational experience confirms
that floating-binary (even with $p$-bit precision instead of
$p + 1)$ produce significantly more accurate results than $(p
+ 2)$-bit floating-hex.
|qleft \qquad Sweeney's tables show that hexadecimal arithmetic
can be done a little faster, since fewer cycles are needed when
scaling to the right or normalizing to the left. But this fact
is insignificant compared to the substantial advantages of $b
= 2$ over all other radices (cf.\ also Theorem 4.2.2C and exercises
4.2.2--15, 21), especially since floating-binary can be made
as fast as floating-hex with only a tiny increase in total processor
cost.
\ansno 7. Suppose that $\sum ↓m \biglp F(10↑{km}
\cdot 5↑k) - F(10↑{km})\bigrp =$ log 5$↑k/$log 10$↑k$ and $\sum
↓m \biglp (F(10↑{km} \cdot 4↑k) - F(10↑{km})\bigrp =$ log $4↑k/$log
$10↑k$; then
$$\sum ↓{m} \biglp F(10$↑{km} \cdot 5↑k) - F(10↑{km} \cdot 4↑k)\bigrp
=$ log$↓{10} {5\over 4}$
|qctr \9skip for all $k$. But now let $ε$ be a small positive
number, and choose $\delta > 0$ so that $F(x) < ε$ for 0 < $x
< \delta $, and choose $M > 0$ so that $F(x) ??Q???\3skip \sum
↓{m} \biglp F(10↑{km} \cdot 5↑k) - F(10↑{km} \cdot 4↑k)\bigrp
=$ log$↓{10} {5\over 4}$
|qctr \9skip for all $k$. But now let $ε$ be a small positive
number, and choose $\delta > 0$ so that $F(x) < ε$ for 0 < $x
< \delta $, and choose $M > 0$ so that $F(x) > 1 - ε$ for $x
> M$. We can take $k$ so large that 10$↑{-k} \cdot 5↑k < \delta$
and $4↑k > M$; hence by the monotonicity of $F$,
$$\sum ↓{m} \biglp F(10$↑{km} \cdot 5↑k) - F(10↑{km} \cdot 4↑k)\bigrp
|tab ≤ \sum ↓{m≤0} \biglp F(10↑{km} \cdot 5↑k) - F(10↑{k(m-1)}
\cdot 5↑k)\bigrp \quad ⊗⊗\quad + \sum ↓{m≥0} \biglp F(10↑{k(m+1)}
\cdot 4↑k) - F(10↑{km} \cdot 4↑k)\bigrp \cr
⊗= F(10↑{-k}5↑k) \tau 1 - F(10↑k4↑k) < 2ε.\cr
\3skip$ {\bf 8.} When $s > r, P↓0(10↑ns)$ is 1 for small $n$,
and 0 when \lfloor 10$↑ns\rfloor > \lfloor 10↑nr\rfloor $. The
least $n$ for which this happens may be arbitrarily large, so
no uniform bound can be given for $N↓0(ε)$ independent of $s$.
(In general, calculus textbooks prove that such a uniform bound
would imply that the limit function $S↓0(s)$ would be continuous,
and it isn't.)
\ansno 9. Let $q↓1, q↓2, . . $. be such that $P↓0(n) = q↓1({n-1\atop
0}) + q↓2({n-1\atop 1}) + . . $. for all $n$. It follows that
$P↓m(n) = 1↑{-m}q↓1({n-1\atop 0}) + 2↑{-m}q↓2({n-1\atop 1})
+ . . $. for all $n$.
\ansno 10. When $1 < r < 10$ the generating function $C(z)$
has simple poles at the points $1 + w↓r, w↓n = 2πn↓i/$ln 10,
hence
$$\6skip $C(z) = {$log$↓{10} r - 1\over 1 - z} + \sum ↓{nn0}
{1 + w↓n\over w↓n} {e↑{-w}|infsup n$$↑{lnr} - 1\over$ (ln 10)($z
- 1 - w↓n)} + E(z)$
|qctr \8skip where $E(z)$ is analytic in the entire plane. Thus
if $\theta =$ arctan(2$π/$ln 10),
$$$c↓m |tab =$ log$↓{10} r - 1 - {2\over$ ln 10} \sum ↓{n>0}
\left({e↑{-w↓$nln$r}}↓{ - 1??2w↓n(1 + w↓n)↑m}} + e↓m\qquad \qquad
⊗\4skip ⊗=$ log$↓{10} r - 1 + {$sin($m\theta + 2π$ log$↓{10}
r) $- sin($m\theta )\over π(1 + 4π↑2/$(ln 10)$↑2\bigrp ↑{m/2}}
+ O\left({1\over \biglp 1 + 16π↑2/$(ln) 10($↑2\bigrp ↑{m/2}}}.\cr
\3skip \bf 11.} When (log$↓b U)$ mod 1 is uniformly distributed
in [0, 1), so is (log$↓b 1/U)$ mod 1 = 1 - (log$↓b U)$mod 1.
(The latter formula holds except when $U$ is a power of $b$,
but that occurs with probability zero.)
\ansno 12. $h(z) = \int ↑{z}↓{1/b} f(x) dxg(z/bx)/bx + \int
↑{1}↓{z} f(x) dxg(z/x)/x$, and $l(z) = \int ↑{z}↓{1/b} f(x)
dxl(z/bx)/bx + \int ↑{1}↓{z} f(x) dxl(z/x)/x$, hence $\biglp
h(z) - l(z)\bigrp /l(z) = \int ↑{z}↓{1/b} f(x) dx\biglp g(z/bx)
- l(z/bx)\bigrp /l(z/bx) + \int ↑{1}↓{z} f(x) dx\biglp g(z/x)
- l(z/x)\bigrp /l(z/x)$. Since $f(x) ≥ 0, |\biglp h)z) - l(z)|newtype
)/l(z)| ≤ \int ↑{z}↓{1/b} f(x) dxA(g) + \int ↑{1}↓{z} f(x) dxA(g)$
for all $z$, hence $A(h) ≤ A(g)$. By symmetry, $A(h) ≤ A(f)$.
[{\sl Bell System Tech.\ J.\ \bf 49} (1970), 1609--1625.]
\ansno 13. Let $X =$ (log$↓b U)$mod 1, $Y =$ (log$↓b V)$mod
1, so that $S$ and $Y$ are independently and uniformly distributed
in [0, 1). No left shift is needed if and only if $X + Y ≥ 1$,
and this occurs with probability ${1\over 2}$.
|qleft \qquad (Similarly, this needs only the weaker assumption
that both operands independently have the {\sl same} distribution.)
\ansno 14. For convenience, the calculations are shown here
for $b = 10$. If $k = 0$, the probability of a carry is
$$\left(${1\over$ ln 10}}↑2\int ↓{1≤x,y≤10??x+y≥10} {dx\over
x} {dy\over y}$.
|qctr \9skip (See Fig.\ A-7.) The value of the integral is
$$$\int ↑{10}↓{0} {dy\over y} \int ↑{10}↓{10-y} {dx\over x}
- 2\int ↑{1}↓{0} {dy\over y}\int ↑{10}↓{10-y}{dx\over x}$,
|qctr and
$$$\int ↑{t}↓{0}{dy\over y}$ ln\left(${1\over 1 - y/10}$} =
\int ↑{t}↓{0}\left(${1\over 10}$ + ${y\over 200}$ + ${y↑2\over
3000}$ + \cdot 0 0} dy = ${t\over 10}$ + ${t↑2\over 400}$ +
${t↑3\over 9000}$ + \cdotss.
|qctr (The latter integral is essentially a ``dilogarithm''
integral.) Hence the probability of a carry when $k = 0$ is
(1/ln 10)$↑2(π↑2/6 - 2 \sum ↓{n≥1} 1/n↑210↑n) = .27154. [Note{\sl
:}$ Whenn ?????????εb = 2 and $k = 0$, fraction overflow {\sl
always} occurs, so this derivation proves that $\sum ↓{n≥1}
1/n↑22↑n = π↑2/12 - {1\over 2}($ln 2)$↑2.]$
|qleft \qquad When $k > 0$, the probability is
$$\left(${1\over ln 10}$}$↑2\int ↑{10↑{1-k}}↓{10↑{-k}}{dy\over
y}\int ↑{10}↓{10-y}{dx\over x} = \left({1\over$ ln 10}}↑2\left(\sum
↓{n≥1} {1\over n↑210↑{nk}} - \sum ↓{n≥1} {1\over n↑210↑{n(k+1)}}}$.
|qctr \9skip Thus when $b = 10$, fraction overflow should occur
with probability .272$p↓0 + .017p↓1 + .002p↓2 + \cdotss$. When
$b = 2$ the corresponding figures are $p↓0 + .655p↓1 + .288p↓2
+ ↓137p↓3 + .067p↓4 + \cdotss$.
|qleft \qquad Now if we use the probabilities from Sweeney's
first table, dividing by .92 to eliminate zero operands, and
assuming that the probabilities are independent of the operand
signs, we predict a probability of about 14 percent when $b
= 10$, instead of the 15 percent in exercise 1. For $b = 2$,
we predict about 48 percent, while the table yeilds 44 percent.
These results are certainly in agreement within the limits of
experimental error.
\ansno 15. When $k = 0$, the leading digit is 1 if and only
if there is a carry. (It is possible for fraction overflow and
subsequent rounding to yield a leading digit of 2, when $b ≥
4$, but we are ignoring rounding in this exercise.) The probability
of fraction overflow is .272 < log$↓{10} 2, as shown in the
previous exercise$.
|qleft \qquad When $k > 0$, the leading digit is 1 with probability
$$\left(${1\over ln 10}$}$↑2\left(\int ↑{10↑{1-k}}↓{10↑{-k}}{dy\over
y}\int ↓{1≤x<2-y\qquad \quad }{dx\over x}} < \left({1\over$
ln 10}}↑2\left(\int ↑{10↑{1-k}}↓{10↑{-k}}{dy\over y}\int ↓{1≤x≤2{dx\over
x}} =$ log$↓{10} 2$.
|qctr \3skip {\bf 16.} To prove the hint [which is due to Landau,
{\sl Prace matematyczno-fizyczne \bf 21} (1910), 103--113],
assume first that lim sup $a↓n = 8 > 0$. Let $ε = λ/(λ + 4M)$
and choose $N$ so that $|a↓1 + \cdots + a↓n| < {1\over 10}ελn$
for all $n > N$. Let $n > N/(1 - ε), n > 5/ε$ be such that $a↓n
> {1\over 2}λ$. Then by induction $a↓{n-k} ≥ a↓n - kM/(n - εn)
> {1\over 4}λ$ for $0 ≤ k < εn$, and $\sum ↓{n-εn<k≤n} a↓k ≥
{1\over 4}λ(εn - 1) > {1\over 5}λεn$. But $|\sum ↓{n-εn<k≤n
a↓k| = |\sum ↓{1≤k≤n} a↓k - \sum ↓{1≤k≤n-εn} a↓k| ≤ {1\over
5}ελn$ since $n - εn > N$. A similar contradiction applies if
lim ing $a↓n < 0$.
|qleft \qquad Assuming that $P↓{m+1}(n) → λ$ as $n → ∞$, let
$a↓k = P↓m(k) - λ$. If $m > 0$, the $a↓k$ satisfy the hypotheses
of the hint (cf.\ Eq.\ 4.2.2--15), $m > 0$, the $a↓k$ satisfy
the hypotheses of the hint (cf.\ Eq.\ 4.2.2--15), since $0 ≤ P↓m(k)
≤ 1$; hence $P↓m(n) → λ$.
\ansno 17. See {\sl Fibonacci Quarterly \bf 7} (1969), 474--475.
(Persi Diaconis [Ph.D. thesis, Harvard University, 1974] has
shown among other things that the definition of probability
by repeated averaging is weaker than harmonic probability, in
the following precise sense: If lim$↓{m→∞}\liminf↓{n→∞}
Pm(n) =$ lim$↓{m→∞}\limsup↓{n→∞} P↓n(n) = λ$ then
the harmonic probability is $λ$. On the other hand the statement
``10$↑k|supsup 2 ≤ n < 10↑k|supsup 2↑{+k}$ for some integer
$k > 0$'' has logarithmic probability ${1\over 2}$, while repeated
averaging never settles down to give it any particular probability.)
|qleft β|newtype 583
%folio 754 galley 6 (C) Addison-Wesley 1978 -
20---Computer Programming\quad (Knuth/Addison-Wesley)\quad f.
754\quad Ans\quad g. 6
|qleft \25skip |newtype {\:r SECTION 4.3.1
|qleft }\11skip |newtype {\bf 2.} If the $i$th number to be
added is $u↓i = (u↓{i1}u↓{i2} \ldotsm u↓{in})↓b$, use Algorithm
A with step A2 changed to the following:
|qleft |indent \qquad {\bf A2}↑\prime {\bf .} [Add digits.]
Set $w↓j ← (u↓{1j} + \cdots + u↓{mj} + k)$mod $b$, and $k ←
\lfloor (u↓{1j} + \cdots + u↓{mj} + k)/b\rfloor $. [The maximum
value of $k$ is $m - 1$, so step A3 would have to be altered
if $m > b.)$
|qleft \3skip |cancelindent
|qleft ⊗ {\bf 3.}⊗⊗ENT1⊗N⊗ 1\cr
⊗⊗⊗JOV⊗OFLO⊗ 1⊗Ensure overflow is off.\cr
⊗⊗⊗ENTA⊗0⊗ 1⊗$k ← 0.\cr
⊗⊗$2H⊗ENT2⊗0⊗ $N⊗$(rI2 ≡ next value of $k)\cr
⊗⊗⊗$ENT3⊗M??N-N,1⊗ $N⊗$\biglp LOC$(u↓{ij}) ≡ U + n(i - 1) +
j\bigrp \cr
⊗⊗$3H⊗ADD⊗U,3⊗$MN⊗$rA ← rA + $u↓{ij}$.\cr
⊗⊗⊗JNOV⊗??+2⊗$MN⊗\cr$
⊗⊗⊗INC21⊗$ K⊗$Carry one.\cr
⊗⊗⊗DEC3⊗N⊗$MN⊗$Repeat for $b ≥ i ≥ 0$.\cr
⊗⊗⊗J3P⊗3B⊗$MN⊗$\biglp rI3 ≡ $n(i - 1) + j\bigrp$ \cr
⊗⊗⊗STA⊗W,1⊗ $N⊗w↓j ←$ rA.\cr
⊗⊗⊗ENTA⊗0,2⊗ $N⊗k ←$ rI2.\cr
⊗⊗⊗DEC1⊗1⊗ $N$\cr
⊗⊗⊗JIP⊗2B⊗ $N⊗$Repeat for $n ≥ j ≥ 0.\cr
⊗⊗⊗$STA⊗W⊗ 1⊗Store final carry in $w↓0.\cr
\9skip$ Running time, assuming that $K = {1\over 2}MN$, is 5.5$MN
+ 7N + 4$ cycles.
\ansno 4. We may make the following assertion before
A1: $``n ≥ 1$; and 0 ≤ $u↓i, v↓i < b$ for $1 ≤ i ≤ n.$'' Before
A2, we assert: ``1 ≤ $j ≤ n; 0 ≤ u↓i, v↓i < b$ for 1 ≤ $i ≤
n; 0 ≤ w↓i < b$ for $j < i ≤ n$; 0 ≤ $k ≤ 1$; and
$$$(u↓{j+1} \ldotsm u↓n)↓b + (v↓{j+1} \ldotsm v↓n)↓b = (kw↓{j+1}
\ldotsm w↓n)↓b.$''
|qctr \9skip The latter statement means more precisely that
$$$\sum ↓{j<t≤n} u↓tb↑{n-t} + \sum ↓{j<t≤n} v↓tb↑{n-t} = kb↑{n-j}
+ \sum ↓{j<t≤n} w↓tb↑{n-t}$.
|qctr \9skip ????????????Before A3, we assert: ``1 ≤ $j ≤ n;
0 ≤ u↓i, v↓i < b$ for 1 ≤ $i ≤ n; 0 ≤ w↓i < b$ for $j ≤ i ≤
n; 0 ≤ k ≤ 1$; and $(u↓j \ldotsm u↓n)↓b + (v↓j \ldotsm v↓n)↓b
= (kw↓j \ldotsm w↓n)↓b.$'' After A3, we assert that 0 ≤ $w↓i
< b$ for 1 $≤ i ≤ n; 0 ≤ w↓0 ≤ 1$; and $(u↓1 \ldotsm u↓n)↓b +
(v↓1 \ldotsm v↓n)↓b = (w↓0 \ldotsm w↓n)↓b$.
|qleft \qquad It is a simple matter to complete the proof by
verifying the necessary implications between the assertions
and by showing that the algorithm always terminates.
|qleft \3skip |indent {\bf 5. B1.} Set $j ← 1, w↓0 ← 0$.
|qleft \qquad {\bf B2.} Set $t ← u↓j + v↓j, w↓j ← t$ mod $b,
i ← j$.
|qleft \qquad {\bf B3.} If $t ≥ b$, set $i ← i - 1, t ← w↓i
+ 1, w↓i ← t$ mod $b$, and repeat this step until $t ← b$.
|qleft \qquad {\bf B.} Increase $j$ by one, and if $j ≤ n$ go
back to B2.
\ansno 6. {\bf C1.} Set $j ← 1, i ← 0, r ← 0$.
|qleft \qquad {\bf C2.} Set $t ← u↓j + v↓j$. If $t ≥ b$, set
$w↓i ← r + 1, w↓k ← 0$ for $i < k < j$, set $i ← j$, and $r
← t$ mod $b$. Oterwise if $t < b - 1$, set $w↓i ← r, w↓k ← b
- 1$ for $i < k < j$, set $i ← j$, and $r ← t$.
|qleft \qquad {\bf C3.} Increase $j$ by one. If $j ≤ n$, go
back to C2; otherwise set $w↓i ← r$, and $w↓k ← b - 1$ for $i
< k ≤ n$.
|qleft \3skip |cancelindent {\bf 7.} When $j = 3$, for example,
we hjave $k = 0$ with probability $(b + 1)/2b, k = 1$ with probability
$\biglp (b - 1)/2b\bigrp (1 - 1/b)$, namely the probability
that a carry occurs and thathe preceding digit wasn't $b - 1;
k = 2$ with probability $\biglp (b - 1)/2b\bigrp (1 - 1/b)$;
and $k = 3$ with probability $\biglp (b - 1)/2b\bigrp (1/b)(1/b)(1)$.
For fixed $k$ we may add the probabilities as $j$ varies from
1 to $n$; this gives the mean number of times the carry propagates
back $k$ places,
$$$m↓k = {b - 1\over 2b↑k}|newtype \left(|newtype (n + 1 - k)\left(1
- {1\over b}} + {1\over b}|newtype }|newtype $.
|qctr \9skip As a check, we find that the average number of
carries is
$$$m↓1 + 2m↓2 + \cdots + nm↓n = {1\over 2}|newtype \left(|newtype
n - {1\over b - 1}\left(1 - \left({1\over b}}↑n}|newtype }|newtype
$,
|qctr \9skip in agreement with (6).
|qleft \3skip
|qctr ⊗{\bf 8.}⊗ENNI⊗N⊗1⊗2H⊗LDA⊗W+N+1,2⊗$K\cr$
⊗⊗⊗JOV⊗OFLO⊗1⊗⊗INCA⊗1⊗$K\cr
⊗⊗⊗STZ$⊗W⊗1⊗⊗STA⊗W+N+1,2⊗$K$\cr
⊗⊗1H⊗LDA⊗U+N+1,1⊗$N⊗⊗$DEC2⊗1⊗$K\cr$
⊗⊗⊗ADD⊗V+N+1,1⊗$N⊗⊗$JOV⊗2B$⊗K$\cr
⊗⊗⊗STA⊗W+N+1,1$⊗N$⊗3H⊗INC2⊗1$⊗N\cr$
⊗⊗⊗JNOV⊗3F$⊗N$⊗3H⊗INC2⊗1$⊗N\cr$
⊗⊗⊗ENT2⊗-1,1$⊗L\cr
\9skip$ The running time depends on $L$, the number of positions
in which $u↓j ?? v↓j ≥ b$; and on $K$, the total number of carries.
It is not difficult to see that $K$ is the same quantity analysis
in the text shows that $L$ has the average value $N\biglp (b
- 1)/2b\bigrp $, and $K$ has the average value ${1\over 2}$($N
- b↑{-1} - b↑{-2} - \cdots - b↑{-n})$. So if we ignore terms
of order 1/$b$, the running time is $9N ?? L + 7K + 3 \approx
13N + 3$ cycles.
|qleft \qquad {\sl Note{\sl :}} Since a carry occurs almost
half of the time, it would be more efficient to delay storing
the result by one step. This leads to a somewhat longer program
whose running time is approximately 12$N + 55 ??????\qquad Note{\sl
:}$ Since a carry occurs almost half of the time, it would be
more efficient to delay storing the result by one step. This
leads to a somewhat longer program whose running time is approximately
12$N + 5$ cycles, based on the somewhat more detailed information
calculated in exercise 7.
\ansno 9. Replace $``b$'' by $``b↓{n-j}$'' everywhere in step
A2.
\ansno 10. If lines 06 and 07 were interchanged, we would almost
always have overflow, but register A might have a negative value
at line 08, so this would not work. If the instructions on lines
05 and 06 were interchanged, the sequence of overflows occurring
in the program would be slightly different in some cases, but
the program would still be right.
\ansno 11. (a) Set $j
← 1$;\xskip (b) if $u↓j < v↓j$, terminate $[u < v]$; if $u↓j = v↓j$
and $j = n$, terminate $[u = v]$; if $u↓j = v↓j$ and $j < n$,
set $j ← j \tau 1$ and repeat (b); if $u↓j > v↓j$, terminate
$[u > v]$. This algorithm tends to be quite fast, since there
is usually low probability that $j$ will have to get very high
before we encounter a case with $u↓j ≠ v↓j$.
\ansno 12. Use Algorithm S with $u↓j = 0$ and $v↓j = w↓j$. Another
``borrow'' will occur at the end of the algorithm, which this
time should be ignored.
\ansno 13. |tab \quad \quad |tab ENTX\quad |tab N\qquad \qquad
|tab 1⊗⊗⊗JOV⊗OFLO⊗1\cr
⊗⊗ENTX⊗O⊗1\cr
⊗2H⊗STX⊗CARRY⊗$N$\cr
⊗⊗LDA⊗U,1$⊗N$\cr
⊗⊗MUL⊗V⊗$N$\cr
⊗⊗SLC⊗5$N$\cr
⊗⊗ADD⊗CARRY$⊗N$\cr
⊗⊗JNOV⊗??+2$⊗N$\cr
⊗⊗Ki????????????x⊗1$⊗K$\cr
⊗⊗STA⊗W,1$⊗N\cr$
⊗⊗DEC1⊗1$⊗N$\cr
⊗⊗J1P⊗2B$⊗N$\cr
⊗⊗STX⊗W⊗1\cr
\9skip The running time is 23$N ?? K + 5$ cycles, and $K$ is
roughly ${1\over 2}N$.
\ansno 14. The key inductive assertion is the one which should
be valid at the beginning of step M4; all others are readily
filled in from this one, which is as follows: ``1 ≤ $i ≤ n;
1 ≤ j ≤ m; 0 ≤ u↓r < b$ for $1 ≤ r ≤ n; 0 ≤ v↓r < b$ for $1
≤ r ≤ m; 0 ≤ w↓r < b$ for $j < r ≤ m + n; 0 ≤ k < b$; and
$$$(w↓{j?}β1 \ldotsm w↓{m+n})↓b + kb↑{m+n-i-j} = u \times (v↓{j+1}
\ldotsm v↓m)↓b + (u↓{i+1} \ldotsm u↓n)↓b \times v↓jb↑{m-j}??2.????????????\9skip
(w↓{j?}β1 \ldotsm w↓{m+n})↓b + kb↑{m+n-i-j} = u \times (v↓{j+1}
\ldotsm v↓m)↓b + (u↓{i+1} \ldotsm u↓n)↓b \times v↓jb↑{m-j}.$''
|qctr \9skip (For the precise meaning of this notation, see
the answer to exercise 4.)
\ansno 15. The error is nonnegative and less than $(n - 2)b↑{-n-1}$.
$\biglp$Similarly, if we ignore the products with $i + j > n
+ 3$, the error is bounded by $(b - 3)b↑{-n-2}$, etc.; but,
in qome cases, we must compute all of the products if we want
to get the true rounded result.$\bigrp$
\ansno 16. {\bf S1.} Set $r ← 0, j ← 1$.
|qleft \qquad {\bf S2.} Set $w↓j ← \lfloor (rb + u↓j)/v\rfloor
, r ← (rb + u↓j)$mod $v$.
|qleft \qquad {\bf S3.} Increase $j$ by 1, and return to S2
if $j ≤ n$.
\ansno 17. $u/v > u↓0b↑n/(v↓1 + 1)b↑{n-1} = b(\biglp 1 - 1/(v↓1
\tau 1)\bigrp > b\biglp 1 - 1/(b/2)\bigrp = b - 2$.
\ansno 18. $(u↓0b + u↓1)/(v↓1 + 1) ≤ u/(v↓1 + 1)b↑{n-1} < u/v$.
\ansno 19. $u - |spose 7qv ≤ u - |spose 7qv↓1b↑{n-1} - |spose
7qv↓2b↑{n-2} = u↓2b↑{n-2} + \cdots + u↓n + 4??????↑{n?}??4g1
- |spose 7qv↓2b↑{n-2} < b↑{n-2}(u↓2 + 1 + |spose 7rb - |spose
7qv↓2) ≤ 0$. Since $u - |spose 7qv < 0, q < |spose 7q$.
\ansno 20. If $q ≤ |spose 7q - 2$, then $u < (|spose 7q - 1)v
< |spose 7q(v↓1b↑{n-1} + (v↓2 + 1)b↑{n-2}\bigrp - v < |spose
7qv↓1b↑{n-1} + |spose 7qv↓2b↑{n-2} + b↑{n-1} - v ≤ |spose 7qv↓1b↑{n-1}
+ (b|spose 7r + u↓2)b↑{n-2} + b↑{n-1} - v = u↓0b↑n + u↓1b↑{n-1}
+ u↓2b↑{n-2} + b↑{n-1} - v ≤ u↓0b↑n + u↓1b↑{n-1} + u↓2b↑{n-2}
≤ u$. In other words, $u < u$, and this is a contradiction.
\ansno 21. Assume that $u - qv < (1 - 3/b)v$; then $|spose 7q(v
- b↑{n-2}) < |spose 7q(v↓1b↑{n-1} + v↓2b↑{n-2}) ≤ u↓0b↑n + u↓1b↑{n-1}
+ u↓2b↑{n-2} ≤ u$; hence $1 = |spose 7q - q < u/(v - b↑{n-2})
- u/v + 1 - 3/b$; that is,
$$${3\over b} < {u\over v}\left({b↑{n-2}\over v - b↑{n-2}}}
< \left({b↑{n-1}\over v - b↑{n-2}}}$.
|qctr \8skip Finally, therefore, $b↑n + 3b↑{n-2} > 3v$; but
this contradicts the size of $v↓1$, unless $b ≤ 3$, and the
exercise is obviously true when $b ≤ 3$.
|qleft β|newtype 58320---Computer Programming\quad (Knuth/Addison
%folio 758 galley 7a (C) Addison-Wesley 1978 -
-Wesley)\quad f. 758\quad Ans\quad g. 7
\ansno 22. Let $u = 4100, v = 588$. We first try
$|spose 7q = \lfloor {41\over 5}\rfloor = 8$, but 8 \cdot 8
> 10(41 - 40) + 0 = 10. Then we set $|spose 7q = 7$, and now
we find 7 \cdot 8 < 10(41 - 35) + 0 = 60. But 7 times 588 equals
4116, so the true quotient is $q = 6$. (Incidentally, this example
shows that Theorem B cannot be improved under the given hypotheses,
when $b = 10.)$
\ansno 23. Obviously $v\lfloor b/(v + 1)\rfloor < (v + 1)\lfloor
b/(v + 1)\rfloor ≤ (v + 1)b/(v + 1) = b$; also if $v ≥ \lfloor
b/2\rfloor$ we obviously have $v\lfloor b/(v + 1)\rfloor ≥ v
≥ \lfloor b/2\rfloor $. Finally, assume that $1 ≤ v < \lfloor
b/2\rfloor $. Then $v\lfloor b/(v + 1)\rfloor > v|newtype (b/(v
+ 1) - 1\bigrp ≥ b/2 - 1 ≥ \lfloor b/2\rfloor - 1$, because
$v\biglp b/(v + 1) - 1\bigrp - b/2 + 1 = (b/2 - v - 1)(v - 1)/(v
+ 1) ≥ 0$. Finally since $v\lfloor b/(v + 1)\rfloor > \lfloor
b/2\rfloor - 1$, we must have $v\lfloor b/(v + 1)\rfloor ≥ \lfloor
b/2\rfloor $.
\ansno 24. The probability is only log$↓b 2, not {1\over 2}.$
(For example, if $b = 2↑{35}$, the probability is approximately
${1\over 35}$; this is still high enough to warrant the special
test for $d = 1$ in steps D1 and D8.)
\ansno 25.
|qleft \3skip
|qleft ⊗{\sl 002⊗⊗}ENTA⊗1⊗1\cr
⊗{\sl 003}⊗⊗ADD⊗V+1⊗1\cr
⊗{\sl 004}⊗⊗STA⊗TEMP⊗1\cr
⊗{\sl 005⊗}⊗ENTA⊗1⊗1\cr
⊗{\sl 006⊗⊗}JOV⊗1F⊗1⊗Jump if $v↓1 = b - 1.\cr
⊗{\sl 007}⊗⊗$ENTX⊗0⊗1\cr
⊗{\sl 008}⊗⊗DIV⊗TE??m??;DIV⊗TEMP⊗1⊗Otherwise compute $b/(v↓1
+ 1).\cr
⊗${\sl 009⊗}⊗JOV⊗DIVBYZERO⊗1⊗Jump if $v↓1 = 0.\cr
⊗{\sl 010}⊗$1H⊗STA⊗D⊗1\cr
⊗{\sl 011}⊗⊗DECA⊗1⊗1\cr
⊗{\sl 012}⊗⊗JANZ⊗??+3⊗1⊗Jump if $d ≠ 1.\cr
⊗{\sl 013⊗⊗}STZ⊗U⊗1 - A\cr
⊗${\sl 014}⊗⊗JMP⊗D2⊗$1 - A\cr
⊗${\sl 015}⊗⊗ENT1⊗N⊗$A⊗$Multiply $v$ by $d.\cr
⊗${\sl 016⊗⊗}ENTX⊗0⊗$A\cr
⊗${\sl 017}⊗2H⊗STX⊗CARRY⊗$AN\cr
⊗${\sl 018}⊗⊗LDA⊗V,1⊗$AN\cr
⊗${\sl 019}⊗⊗MUL⊗D⊗$AN\cr
⊗ . . .⊗⊗⊗⊗⊗$(as in exercise 13)\cr
⊗{\sl 026⊗⊗}J1P⊗2B⊗$AN\cr
⊗${\sl 027}⊗⊗ENT1⊗M+N⊗$A⊗$(Now rX = 0.)\cr
⊗{\sl 028}⊗2H⊗STX⊗CARRY⊗$A(M + N)⊗$Multiply $u$ by $d.\cr
⊗${\sl 029}⊗⊗LDA⊗U,1⊗$A(M + N)\cr
⊗ . . .⊗⊗⊗⊗⊗$(as in exercise 13)\cr
⊗{\sl 037}⊗⊗J1P⊗2B⊗$A(M + N)\cr
⊗${\sl 038⊗⊗}STX⊗U⊗$A\cr
\3skip$ {\bf 26.} (See the algorithm of exercise 16.)
|qleft \3skip
|qleft \qquad {\sl 101⊗}D8⊗LDA⊗D⊗1\quad ⊗\quad (Remainder will
be left in loca-\cr
\qquad {\sl 102⊗⊗}DECA⊗1⊗1\quad ⊗\qquad tions U+M+1 through
U+M+N)\cr
\qquad {\sl 103}⊗⊗JAZ⊗DONE⊗1\quad ⊗\quad Terminate if $d = 1.\cr
\qquad {\sl 103}⊗⊗ENNI⊗N⊗A\quad ⊗\quad$ rI1 ≡ $j - n - 1; j
← 1.\cr$
\qquad {\sl 105}⊗⊗ENTA⊗O⊗$A\quad ⊗\quad r ← 0.\cr
\qquad$ {\sl 106}⊗1H⊗LDX⊗U+M+N+1,1⊗$AN\quad ⊗\quad$ rAX ← $rb
?? u↓{m+j}.\cr$
\qquad {\sl 107}⊗⊗DIV⊗D⊗$AN\quad \cr$
\qquad {\sl 108}⊗⊗STA⊗U+M+N+1,1⊗$AN\quad \cr$
\qquad {\sl 109}⊗⊗SLAX⊗5⊗$AN\quad ⊗\quad r ← (rb ?? u↓{m?}βj)$mod
$d$.\cr
\qquad {\sl 110}⊗⊗INC2⊗1⊗$AN\quad ⊗\quad j ← j ?? 1.\cr$
\qquad {\sl 111⊗}⊗J2N⊗1??⊗$AN\quad ⊗\quad$ Repeat for 1 ≤ $j
≤ n.\cr
\9skip$ At this point, the division routine is complete; and
by the next exercise, register AX is zero.
\ansno 27. It is $de$ mod $dv = d(u$ mod $v)$.
\ansno 28. For convenience, let us assume $v$ has a decimal
point at the left, that is, $vN46??????????????\3skip \bf 27.}
It is $de$ mod $dv = d(u$ mod $v)$.
\ansno 28. For convenience, let us assume $v$ has a decimal
point at the left, that is, $v = (v↓{0.}v↓{12} . . .)$. After
step N1 we have ${1\over 2}$ ≤ $v < 1 ?? 1/b:$ for
$$$v\left[{b + 1\over v↓1 \tau 1}} |tab ≤ {v(b + 1)\over v↓1
+ 1} = {v(1 + 1/b)\over (1/b)(v↓1 ?? 1)} < 1 + {1\over b},⊗$and⊗$\hfill
v\left[{b + 1\over v↓1 + 1}} ⊗≥ {v(b \tau 1 - v↓1)\over v↓1
+ 1} ≥ {1\over b} {v↓1(b + 1 - v↓1)\over v↓1 + 1}.\cr
\9skip$ The latter quantity takes its smallest value when $v↓1
= 1$, since it is a convex function and the other extreme value
is greater.
|qleft \qquad The formula in step N2 may be written
$$$v ← \left[{b(b + 1)\over v↓1 + 1}}{v\over b}$,
|qctr \9skip so we see as above that $v$ will never become ≥1
+ 1/$b$.
|qleft \qquad The minimum value of $v$ after one iteration of
step N2 is ≥
|qleft \9skip \left(${b(b + 1) - v↓1\over v↓1 + 1}$}${v\over
b}$ ≥ \left(${b(b + 1) - v↓1\over v↓1 ?? 1}$}${v↓1\over b↓2}$
|tab = \left(${b(b + 1) + 1 - t\over t}$}\left(${t - 1\over
b↑2}$}⊗\4skip ⊗= 1 + ${1\over b}$ ?? ${2\over b↑2}$ - ${1\over
b↑2}$\left(t + ${b(b + 1) + 1\over t}$},\cr
\9skip if $t = v↓1 + 1$. The minimum of this quantity occurs
for $t = b/2 + 1$; a lower bound is $1 - 3/2b$. Hence $v↓1 ≥
b - 2$, after one iteration of step N2. Finally, we have $(1
- 3/2b)(1 \tau 1/b)↑2 > 1$, when $b ≥ 5$, so at most two more
iterations are needed. The assertion is easily verified when
$b < 5$.
\ansno 29. True, since $(u↓j \ldotsm u↓{j+n})↓b < v$. (But in
Program D, $u↓j$ is left as $b - 1$ if step D6 is necessary,
since there was no need to reset $u↓j$ when it was never being
used in the subject calculation.)
\ansno 30. In Algorithms A and S, such overlap is possible if
the algorithms are rewritten slightly; e.g., in Algorithm A,
we could rewrite step A2 thus: ``Set $t ← u↓j + v↓j + k, w↓j
← t$ mod $b, k ← \lfloor t/b\rfloor .$''
|qleft \qquad In Algorithm M, $v↓j$ may be in the same location
as $w↓j$. In Algorithm D, it is most convenient (as in Program
D, exercise 26) to let $r↓1 \ldotsm r↓n$ be the same as $u↓{m+1}
\ldotsm u↓{m+n}$; and we can also have $q↓0 \ldotsm q↓m$ the same
as $u↓0 \ldotsm u↓m$, provided that no alteration of $u↓j$ is
made in step D6. (Such an alteration is unnecessary, as mentioned
in exercise 29.)
\ansno 31. One possibility is this, if we consider the situation
of Fig.\ 6 with $u = u↓ju↓{j+1} \ldotsm u↓{j+n}$ as in Algorithm
D: If the leading nonzero digits of $u$ and $v$ have the same
sign, set $r ← u - v, q ← 1$; otherwise set $r ← u + v, q ←
-1$. Now if $|r| > |u|$, or if $|r| = |u|$ and the first nonzero
digit of $u↓{j+n+1} \ldotsm u↓{m+n}$ has the same sign as the
first nonzero digit of $r$, set $q ← 0$; otherwise set $u↓j
\ldotsm u↓{j+n} ← r$.
\ansno 36. Values to 1000 decimal and 1100 octal places have
been computed by R. P. Brent, Tech. Rep. 47, Comp. Centre, Australian
Nat. Univ., Canberra, 1975.
%folio 760 galley 7b (C) Addison-Wesley 1978 -
|qleft \25skip |newtype {\sl }{\:r SECTION 4.3.2
|qleft }\11skip |newtype {\bf 1.} The solution is unique since
7 \cdot 11 \cdot 13 = 1001. The ``constructive'' proof of Theorem
C tells us that the answer is \biglp (11 \cdot 13)$↑6 + (7 \cdot
13)↑{10} + 5 \cdot (7 \cdot 11)↑{12}\bigrp mod 1001. This answer
DO WE WANT THIS EXCLAM PT? ↓
is perhaps not explicit enough! By (23) we have v↓1 = 1, v↓2
= (6 - 1) \cdot 8$ mod 11 = 7, v$↓3 = \biglp (5 - 1) \cdot 2
- 7\bigrp \cdot 6 mod 13 = 6, so u = 6 \cdot 7 \cdot 11 + 7
\cdot 7 + 1 = 512$.
\ansno 2. Yes (by the ``constructive'' proof given,
but not by the ``nonconstructive'' proof).
\ansno 3. $u ≡ u↓i $(modulo $m↓i)$ implies that $u ≡ u↓i$ \biglp
modulo gcd($m↓i, m↓j)\bigrp $, so the condition $u↓i ≡ u↓j \biglp$
modulo gcd($m↓i, m↓j)\bigrp$ must surely hold if there is a
solution. Furthermore if $u ≡ v$ (modulo $m↓j)$ for all $j$,
then $u - v$ is a multiple of lcm($m↓1, \ldotss , m↓r) = m$;
hence there is at most one solution.
|qleft \qquad The proof can now be completed in a nonconstructive
manner by counting the number of different sets $(u↓1, \ldotss
, u↓r)$ satisfying the conditions 0 ≤ u$↓j < m↓j$ and $u↓i ≡
u↓j \biglp$ modulo gcd$(m↓i, m↓j)\bigrp . If this number is
m$, there must be a solution since $(u$ mod$.m↓1, \ldotss , u$
mod $m↓r)$ takes on $m$ distinct values as $u$ goes from $a$
to $a + m$. Assume that $u↓1, \ldotss , u↓{r-1}$ have been chosen
satisfying the given conditions; we must now pick $u↓r ≡ u↓j$
\biglp modulo gcd$(m↓j, m↓r)\bigrp for 1 ≤ j < r$, and by the
generalized Chinese Remainder Theorem for $r - 1$ elements there
are
$$$m↓r/$lcm\biglp gcd$(m↓1, m↓r), \ldotss $, gcd$(m↓{r-1}, m↓r)\bigrp
|tab = m↓r/$gcd\biglp lcm$(m↓1, \ldotss , m↓{r-1}), m↓r\bigrp
⊗\4skip ⊗=$ lcm($m↓1, \ldotss , m↓r)/$lcm($m↓1, \ldotss , m↓{r-1})\cr
\9skip$ ways to do this. [This proof is based on identities
(10), (11), (12), and (14) of Section 4.5.2.]
|qleft \qquad A constructive proof [A. S. Fraenkel, {\sl Proc.\ Amer.\ Math.\ Soc.\
\bf 15} (1963), 790--791] generalizing (24) can be given
as follows. Let $M↓j =$ lcm$(m↓1, \ldotss , m↓j)$; we wish to
find $u = v↓rM↓{r-1} + \cdots + v↓2M↓1 + v↓1$, where 0 ≤ $v↓j
< M↓j/M↓{j-1}$. Assume that $v↓1, \ldotss , v↓{j-1}$ have already
been determined; then we must solve the congruence
$$$v↓jM↓{j-1} + v↓{j-1}M↓{j-2} + \cdots + v↓1 ≡ u↓j\quad$ (modulo
$m)$.
|qctr \9skip Gere $v↓{j-1}M↓{j-2} + \cdots + v↓1 ≡ u↓i ≡ u↓j$
\biglp modulo gcd$(m↓i, m↓j)\bigrp$ for $i < j$ by hypothesis,
so $c = u↓j - (v↓{j-1}M↓{j-2} + \cdots + v↓1)$ is a multiple
of
|qleft β?????????|newtype 58320---Computer
%folio 761 galley 8 (C) Addison-Wesley 1978 -
Programming\quad (Knuth/Addison-Wesley)\quad f. 761\quad Ans\quad
g. 8
|qleft \9skip lcm\biglp gcd($m↓1, m↓j), \ldotss $, gcd$(m↓{j-1},
m↓j)\bigrp =$ gcd($M↓{j-1}, m↓j) = d↓j$.
|qctr \9skip We therefore must solve $v↓jM↓{j-1} ≡ c$ (modulo
$m↓j)$. By Euclid's algorithm there is a number $c↓j$ such that
$c↓jM↓{j-1} ≡ f↓j($modulo $m↓j)$; hence we may take
$$$v↓j = (c↓jc)/d↓j$ mod$(m↓j/d↓j)$.
|qctr \9skip Note that, as in the nonconstructive proof, we
have $m↓j/d↓j = M↓j/M↓{j-1}$.
\ansno 4. (After $m↓4 = 91 = 7 \cdot 13$, we have
used up all products of two or more odd primes that can be less
than 100, so $m↓5, . . $. must all be prime.)
|qleft \9skip $m↓7 |tab = 79,\quad m↓8 |tab = 73,\quad m↓9 |tab
= 71,\quad m↓{10} |tab = 67,\quad m↓{11} |tab = 61,⊗\4skip \hfill
m↓{12} ⊗= 59,\hfill m↓{13} ⊗= 53,\hfill m↓{14} ⊗= 47,\hfill
m↓{15} ⊗= 43,\hfill m↓{16} ⊗= 41,\cr
\4skip \hfill m↓{17} ⊗= 37,\hfill m↓{18} ⊗= 31,\hfill m↓{19}
⊗= 29,\hfill m↓{20} ⊗= 23,\hfill m↓{21} ⊗= 17,\cr
\9skip$ and then we are stuck $(m↓{22} = 1$ does no good).
\ansno 5. No; an obvious upper bound is
$$3$↑45↑27↑211↑1 \cdots = \prod ↓{p$odd??$p$prime} $p↑|"l$$↑{log}|infsup
p↑{100|}"L$.
|qctr \9skip This upper bound is attained if we choose $m↓1
= 3↑4, m↓2 = 5↑2$, etc. (It is more difficult, however, to maximize
$m↓1 \ldotsm m↓r$ when $r$ is fixed, or to maximize $m↓1 + \cdots
+ m↓r$ as we would attempt to do using moduli $2↑m|infsup j
- 1.)$
\ansno 6. (a) If $e = f + kg$, then $2↑e = 2↑f(2↑g)↑k
≡ 2↑f \cdot 1↑k$ (modulo 2$↑g - 1)$. So if $2↑e ≡ 2↑f$ (modulo
2$↑g - 1)$, then 2$↑$${emodg} ≡ 2↑$${fmodg}$ (modulo 2$↑g -
1)$, and since the latter quantities lie between zero and $2↑g
- 1$ we must have $e$ mod $g - f$ mod $g$.
(b) By part (a),
$$(1 + 2$↑d + \cdots + 2↑{(c-1)d}) \cdot (2↑e - 1) ≡ (1 + 2↑d
+ \cdots + 2↑{(c-1)d}) \cdot (2↑d - 1)$
|qleft \3skip = 2$↑{cd} - 1 ≡ 2↑{cc} - 1 O 2↑1 - 1 = 1\quad$
(modulo 2$↑f - 1)$.
|qright \3skip {\bf 7.} \biglp $u↓j - (v↓1 + m↓1(v↓2 + m↓2(v↓3
+ \cdots + m↓{j-2}v↓{j-1}) \cdot \cdot \cdot ))\bigrp c↓{1j}
\ldotsm c↓$
{(j-1)j}|qleft \qquad \qquad |tab = (u$↓j - v↓1)c↓{1j} \ldotsm
c↓{(j-1)j} - m↓1v↓2c↓1 \ldotsm c↓{(j-1)j} - \cdots ⊗\3skip -
m↓1 \ldotsm m↓{j-2}v↓{j-1}c↓{1j} \ldotsm c↓{(j-1)j}⊗\3skip ⊗≡
(u↓j - v↓1)c↓{1j} \ldotsm c↓{(j-1)j} - v↓2c↓{2j} \ldotsm c↓{(j-1)j}
- \cdots - v↓{j-1}c↓{(j-1)j}\cr
\3skip ⊗= \biglp \cdots ((u↓j - v↓1)c↓{1j} - v↓2)c↓{2j} - \cdots
- v↓{j-1}\bigrp c↓{(j-1)j}\quad$ (modulo $m↓j).\cr
\9skip$ \qquad This method of rewriting the formulas uses the
same number of arithmetic operations and fewer constants; but
the number of constants is fewer only if we order the moduli
so that $m↓1 < m↓2 < \cdots < m↓r$, otherwise we would need
a table of $m↓i$ mod $m↓j$. This ordering of the moduli might
seem to require more computation than if we made $m↓1$ the largest,
$m↓2$ the next largest, etc., since there are many more operations
to be done modulo $m↓r$ than modulo $m↓1$; but sine $v↓j$ can
be as large as $m↓j - 1$, we are better off with $m↓1 < m↓2
< \cdots < m↓r$ in (23) also. So this idea appears to be preferable
to the formulas in the text, although the formulas in the text
are advantageous when the moduli have the form (14), as shown
in Section 4.3.3.
\ansno 8. $m↓{j-1} \ldotsm m↓1v↓j ≡ m↓{j-1} \ldotsm m↓1\biglp
\cdots ((u↓j - v↓1)c↓{1j} - v↓2)c↓{2j} - \cdots - v↓{j-1}\bigrp
c↓{(j-1)j} ≡ m↓{j-2} \ldotsm m↓1\biglp \cdots (u↓j - v↓1)c↓{1j}
- \cdots - v↓{j-2}\bigrp c↓{(j-2)j} - v↓{j-1}m↓{j-2} \ldotsm
m↓1 ≡ \cdots ≡ u↓j - v↓1 - v↓2m↓1 - \cdots - v↓{j-1}m↓{j-2}
\ldotsm m↓1\quad$ (modulo $m↓j)$.
\ansno 9. u$↓r ← \biglp (\ldotsm (v↓rm↓{r-1} + v↓{r-1})m↓{r-2}
+ \cdot \cdot \cdot )m↓1 + v↓1\bigrp$ mod $m↓r, \ldotss $,
|qleft u$↓2 ← (v↓2m↓1 + v↓$1mod $m↓2, u↓1 ← v↓1$ mod $m↓1$.
|qright \9skip (The computation should be done in this order,
if we want to let $u↓j$ and $v↓j$ share the same memory locations
as is possible in (23).\bigrp
\ansno 10. If we redefine the ``mod'' operator so that it produces
residues in the symmetrical range, the basic formulas (2), (3),
(4) for arithmetic and (23), (24) for conversiond remain the
same, and the number $u$ in (24) lies in the desired range (10).
$\biglp$Here (24) is a {\sl balanced mixed-radix} notation, generalizing
``balanced ternary'' notation.$\bigrp$ The comparison of two numbers
may still be done from left to right, in the simple manner described
in the text. Furthermore, it is possible to retain the value
$u↓j$ in a single computer word, if we have signed magnitude
representation within the computer, even if $m↓j$ is almost
twice the word size. But the arithmetic operations analogous
to (11) and (12) are more difficult, so it appears that on most
computers this idea would result in slightly slower operation.
\ansno 11. Multiply by
|qleft ${m + 1\over 2} = \left({m↓1 + 1\over 2}, \ldotss , {m↓r
+ 1\over 2}}$.
|qctr Note that
|qleft $2t \cdot {m + 1\over 2} ≡ t\quad$ (modulo $m)$.
|qctr \9skip In general if $v$ is relatively prime to $m$, then
we can find (by Euclid's algorithm) a number $v↑\prime = (v↑{↑\prime}↓{1},
\ldotss , v↑{↑\prime}↓{r})$ such that $vv↑\prime ≡ 1$ (modulo
$m)$; and then if $u$ is known to be a multiple of $v$ we have
$u/v = uv↑\prime $, where the latter is computed with modular
multiplication. When $v$ is not relatively prime to $m$, division
is much more difficult.
\ansno 12. Obvious from (11), if we replace $m↓j$ by $m$.
\ansno 13. (a) $x↑2 - x = (x - 1)x ≡ 0$ (modulo 10$↑n)$ is equivalent
to $(x - 1)x ≡ 0$ (modulo $p↑n)$ for $p = 2$ and 5. Either $x$
or $x - 1$ must be a multiple of $p$, and then the other is
relatively prime to $p↑n$; so either $x$ or $x - 1$ must be
a multiple of $p↑n$. If $x$ mod $2↑n = x$ mod $5↑n = 0$ or 1,
we must have $x = 0$ or 1; hence $x$ mod $2↑n ≠ x$ mod $5↑n$.\xskip
(b) If $x = qp↑n + r$, where $r = 0$ or 1, then $r ≡ r↑2 ≡ r↑3$,
so $3x↑2 - 2x↑3 ≡ (6qp↑nr + 3r) - (6qp↑nr + 2r) ≡ r$(modulo
$p↑n)$.\xskip (c) Let $c↑\prime = \biglp 3(cx)↑2 - 2(cx)↑3\bigrp /x↑2
= 3c↑2 - 2c↑3x$.
|qleft \qquad {\sl Note{\sl :}} Since the last $k$ digits of
an $n$-digit automorph form a $k$-digit automorph, it makes
sense to speak of the two ∞-digit automorphs, $x$ and 1 - $x$,
which are 10-adic numbers (cf.\ exercise 4.1--31). Th??\quad
??\quad ??\quad ??\quad ??\quad ??\quad ??\quad ??\quad ??\quad
??\quad ??\quad ??\quad ??\quad ??\quad ??\quad ??\quad ??\quad
??\quad ??\quad ??\quad ??\quad ??\quad ??\quad ??\quad ??\quad
??\quad ??\quad ??\quad ??\quad ??\quad ??\quad ??\quad ??\quad
??\quad ??\quad